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ABSTRACT 


Discrete data sources arising from real problems are 
generally characterized by only partially known and varying 
statistics. This report provides the development and analysis 
of some practical adaptive techniques for the efficient noise- 
less coding of a broad class of such data sources. 

Specifically, algorithms are developed for coding discrete 
memoryless sources which have a known symbol probability 
ordering but unknown probability values. A general applica- 
bility of these algorithms to solving practical problems is 
obtained because most real data sources can be simply trans- 
formed into this form by appropriate preprocessing. 

These algorithms have exhibited performance only slightly 
above all entropy values when applied to real data with stationary 
characteristics over the measurement span. However, perfor- 
mance considerably under a measured average data entropy may 
be observed when data characteristics are changing over the 
measurement span. The latter observation is a result of the 
ability to adjust to both short term and long term variations 
in data characteristics. 

These techniques are applicable to virtually any alphabet 
size arising in practice. A subset of these results is the speci- 
fication and analysis of a large class of efficient adaptive coders 
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for a binary memoryless source which is characterized by 
unknown or varying statistic (probability of a zero). Again, 
performance will be slightly above the binary entropy function 
when p^ is unchanging but will typically be well under a mea- 
sured average binary entropy when p^ is changing over the 
measurement span. 

These techniques are both easy to use and amenable to 
practical high rate implementations. Functions of sums of 
data samples provide tight bounds to algorithm performance. 
Thus investigations of the effects of alternative algorithm or 
preprocessing configurations can be accomplished without the 
need for complete coder simulations. These same functions 
also serve to simplify internal decision making. Partially 
as a result of the unique cascading of variable length coding 
operations, the only implementation requirement for storage 
of code words is eight binary codewords of which the longest 
is five bits. 
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I. INTRODUCTION 


The basic problem we are addressing is one in which discrete data 
sources are to be coded into binary representations from which the original can 
be retrieved precisely. Thus these binary mappings are reversible. Standard 
binary representations of a fixed number of bits/sample is the most obvious 
and well known example of such mappings. 

The statistical characteristics of most real data sources is such that cer- 
tain things happen more often than others. Then, quite intuitively, it should be 
possible to reduce the average number of bits required by representing the 
frequently occurring events with fewer bits than the infrequent ones. Indeed 
this is the case. 

The vast majority of practical attempts at coding of real sources to 
remove this inherent statistical redundancy have used a particular approach, 
with quite predictable results. Specifically, after reversible preprocessing 
to produce a near memoryless source (take differences from predicted values, 
run lengths, etc. ) the usual approach to coding has been to determine a proba- 
bility distribution and then use the famous Huffman algorithm to obtain an 
"optimum 1 ' variable length code. Unfortunately, this optimality is quite 
restrictive. The Huffman derived code would give the best average performance 
of any single prefix code (lowest bits/sample) on a source for which symbols 
always occurred according to the assumed probability distribution. But the 
statistical characteristics of almost any real data source change with time, 
sometimes dramatically. The assumed probability distribution used to derive 
and perhaps test the Huffman code might be utterly wrong part of the time when 
monitored over a very long sequence. At the same time U L may simply be a 
distribution which is the result of averaging out many short term variations in 
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data characteristics. The point here is that there is room for improvement by 
adapting the coding procedures to fit the changing data characteristics. 

The theoreticians have only recently started attacking this problem in 
earnest under the name "universal noiseless coding. A set of praccical 

adaptive variable length coding procedures was developed some time 
ago^’ ^ for the specific task of providing efficient coding of spacecraft 
imaging data where exact reproduction was a requirement. These algorithms 
would now come under most people's definition of universal noiseless coding. 
However, because of the . ather specialized nature of the spacecraft application 
the practical versatility of these algorithms to efficiently code a wide variety 
of data sources has often been overlooked. This report wall reintroduce them 
in a way which will hopefully make their general applicability both obvious and 
easy to accomplish. Using these algorithms as building blocks, more sophisti- 
cated coding systems offering additional performance benefits will be developed. 
The latter results are currently used in an imaging data compression system 
called RM2^'^, 

PROBLEM DEFINITION AND BACKGROUND 

This section provides further discussion of the practical problem that 
subsequent chapters will address as well as the preliminary notation necessary 
to proceed. 

Some Basic Notation Conventions 

Concatenation. If X and Y are two sequences of samples then we can form 
a new sequence Z by running them back to back as 

Z = X*Y (1-1) 

using the asterisk as our basic indication of concatenation. However, we will 
occasionally omit the # where no confusion should result. 
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Length c x a Sequence, Any sequence of non-b samples can be 

represented by a sequence of bits using the familiar standard binary represen- 
tations which use a fixed number of bits. Without any anticipated confusion 
operator '>'(•) will be used to specify the length of such a sequence in samples 
or in bits (of its standard binary representation). Then if X is a sequence of 
J samples 

i? ? (X) = J samples (1-2) 

and if the standard binary representation of X required 6 bitb'/sample 

Se(X) = 6 J bits (1-3) 

If X is already a binary sequence (e. g. , the result of coding) then '/'(X) will 
simply mean the length of X in bits. 

General Form of Reveisible Operators 

It is instructive to characterize the general form that reversible operators 
will take. Subsequent developments will seem much l^ss abstract. Let Z be 
some sequence of, possibly coi related, data, and let tt be a priori or side 
information about Z. Then a reversible operator of Z would take the form 

Z = F[Z,tt 3 = fjlz.ir] * f 2 [Z,ir] * (1-4) 

Each of the £ [• > •] represent mapping operations of Z and tt which individually 

may not be reversible, but if all the f [Z,tt] and tt are available then Z may be 

i 

recovered precisely. That is, F[* » *] has an inverse. 
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When 


r /’(Z') < //'( Z) (1-5) 

in bits then we have achieved a more efficient binary representation of Z. If 
this is true fo^ many different Z,f[ # » •] may be a useful code operator . 

Whether this is true or not, Z' can always be viewed as a sequence of symbols 
to which reversible operators of the form in (1-4) can be applied. Thus we 
might generate 

Z" = F'[z’.tt] = F'[f[Z,it]] (1-6) 

or by relabeling 

Z" = F"[Z,tt] (1-7) 

Again, if ,'/'(Z") < '/'(£) for many different Z sequences, operator F"[* • •] may 
also be a useful code operator. We will make use of these observations in 
later chapters. Complex sequence code operators will be built up from simpler 
ones. 

A Notational Convention 

The code operator structures that we will develop and are interested in 
identifying generally have many possible alternative internal parameters. Quite 
obviously, carrying all these parameters within discussions and block diagrams 
would present an unwieldy, if not impossible, notation problem. To avoid this 
we will simply omit explicit reference to internal parameters when naming 
operators. We adopt the convention of subscripting and super scripting the 
symbol^ (and occasional other symbols) to identify operators (structures) and 
implicitly assume that a detailed specification can be obtained by reference to 
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a corresponding parameter string. The development of n the Basic Compressor 1 ' 
in Chapter II will be used as a vehicle for introducing this convention. 

Simplifying the Problem 

A great many practical coding problems v/hich may look quite different on 
the surface can be transformed into coding problems with very similar charac- 
teristics. Consequently, a solution to a transformed coding problem can have 
general applicability. The form of this transformed coding problem will be 
developed in subsequent paragraphs. 

Removing correlation. In real problems where samples of Z are corre- 
lated with themselves or a priori information 7 r, there is usually some simple 
transformation which results in new sequences Z f such that the samples are 
approximately independent. More important, the uncertainty in what the sample 
values will be is usually greatly reduced. The less uncertainty there is the 
greater the potential for reducing the average bits required to code. This step 
is crucial in many practical problems but we will, for the most part, 
assume it is already accomplished. That is, data sequences will be assumed 
to be from approximately memoryless sources (no correlation). The user 
of algorithms to be developed here would then precede them with appropriate 
correlation removing transformations: operators of the form in (1-4). 

Examples of correlation removing transformations abound. Taking dif- 
ferences between adjacent samples along a TV line results in approximately 
independent difference samples which tend to be tightly distributed about zero 
(less uncertainty), A priori information might be the preceding line; appro- 
priate use of this additional information generally leads to a similar result but 
wdth samples more tightly distributed about zero (less uncertainty still). A 
sequence of samples might also be successive states of a Markov Source or 
run lengths from a run length coder. 
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Symbol labeling . Given q possible symbols from some source it is a 


simple matter to relabel them into the numbers 0. 1, 2, ' * * q-1. Unless noted 
rtherwise we will assume that such relabeling has already been done. 

Symbol probability ordering . As part of the same problem, let P = {p^J 
be the probability distribution of symbols 0, 1, 2, q-1. For a wide class of 

practical problems the probability ordering of symbols after correlation 
reducing operations is a priori known (or at least well approximated). In fact, 
this ordering tends to remain the same (or close) even as the actual P may be 
changing quite dramatically (e. g. , consider again the independent difference 
samples along a TV scan line). It is again a simple matter to relabel source 
symbols, if necessary, so that the following conditions are well approximated. 

'o ! P] -> p ! ••• 2 Vi U ' 8) 

Preprocessing summary . Thus we will generally assume that data to be 
coded has been preceded by the reversible preprocessing operations summarized 
in Fig. 1-1. 

Changing P 

If the symbol probability distributions resulting from the preprocessing 
operations just described were always known and fixed then there would be 
little need to proceed any further. The standard procedure of using the 

M 

Huffman algorithm 1 1 to derive an optimum variable length code for the known 
distribution P would yield coding efficiency about as good as could be expected 
(unless of course there is r for improvement of the preprocessing operations). 
However, most real world problems are best characterised by a "changing and 
poorly defined P. " 
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Fig. 1-1. Reversible Preprocessing 





Variations of P in real problems appear in many different ways. P may 
vary simply because separate short sequences are the result of preprocessing 
different data sources. There may be long and short term statistical variations 
in a single data source (e. g. , picture to picture variations in image data caused 
by totally different scenes, different camera, lighting, etc. and local variations 
for similar reasons). Other than the approximate probability ordering, (1-8), 

P may not be known at all. In any case a selected "optimum* 1 Huffman code may 
perform quite poorly when the actual symbol distribution is different than the 
assumed. This is basically the problem we seek a practical remedy for in 
subsequent chapters. 

Practical Measures of Performance 

Entropy definition. Given the discrete symbol probability distribution 
P = {p^} the entropy H(P) is defined by 

H(P) = ^ p. log ? p, bits/sample (1-9) 

i 

When properly used, H(P) can be a useful practical tool in assessing how well 
a particular coding algorithm performs. 

Interpretation of H(P) . If Z is an infinite sequence of samples from a 
memoryless source with fixed and known symbol probability distribution 
P = {p^} then H(P) represents the minimum possible expected bits/sample 
required to represent Z using any coding technique. But as we have just noted 
most practical problems which can be transformed by preprocessing into 
equivalent memoryless problems (Fig. 1-1) are characterized by changing or 
possibly unknown distributions. In practice it is generally difficult if not 
impossible to meaningfully model the way in which P changes, although the fact 
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that it changes may be quite obvious. Consequently, the equivalent "bounds" 
for real data sources with changing P are difficult to come by and we will not 
pretend to develop any here. Instead we will principally use (1-9) as a 
"practical measure of performance" rather than a bound on expected performance. 
The reader may consult the theoretical literature for performance bounds on 
idealized data sources. 

Except where explicitly noted, the stated performance of a particular code 
operator ijj.l *1 will be based on measured performance on real data rather than 
statistical expectation based on s^me idealized model. We will generally use a 

i, 

span of K samples much greater than the length of sequence that ijn| •] operates 
on. Similarly, an average symbol probability distribution, P, derived from 
a histogram of the same K samples can be used to provide the desired practical 
measure of performance H(P). 

If the real data has a somewhat uniform statistical character over the 
I\ samples then H(P) represents a practical bound to average per sample per- 
formance of any An algorithm is performing efficiently if its measured 

average performance' is close to H(P). How ever, if data character changes 
significantly over the K samples then average per sample performance under 
the measured H(P) may be possible by adapting the coding to suit the changes. 

H(P) is still a useful guide in those cases and docs in fact bound the best per- 
formance available with a single code (e.g. , a Huffman code designed for P). 
SUMMARY OF RESULTS ) 

The principal result of Chapter II is the development of a code operator 
called the Basic Compressor which provides measured performance clos^, to 
H(P) (P a priori unknown but stable over the measurement span) for values of 
H(P) in the range of 0. 7 to 4. 0 bits/snmple. This operator should have broad 
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applicability and Chapters III - V are examples of using it to generate new 
operators with additional characteristics. 

In Chapter III operators are developed which are capable of extending 
these results to much higher values of H(P). Similarly, code operators are 
developed in Chapter IV which are capable of providing average performance 
close to 1 1 ( P) for H(P)—0 as well as the higher entropies. An outgrowth of 
these developments is a class of binary memoryless coders capable of perform- 
ing close to the binary entropy function as the (a priori unknown) probability of 
a binary zero or one varies between 0. 0 and 1. 0. These binary operators are 
described in Chapter V. 

In all cases, performance under H(P) has been observed for these 
operators when the data characteristics change substantially over the span of 
samples used for measurement. 

An extremely useful practical characteristic of these algorithms is that in 
each case accurate estimates (actually bounds) of actual performance can be 
obtained basically as simple functions of the sum of input samples. This allows 
for accurate performance assessments without the need for elaborate simula- 
tions involving the generation of bit streams. This can greatly simplify the 
determination of various parameters (e.g. , assessing a preprocessor) as well 
as aiding in the creation of new algorithms (which use the Basic Compressor as 
a basic tool). Additionally these same functions serve to simplify internal 
decision making. 
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II. THE BASIC COMPRESSOR 


This chapter will provide the development of an algorithm for efficiently 
representing blocks of J preprocessed (see Fig. 1-1) samples when P is 
unknown and the measured average entropy H(P) lies in the range of roughly 
0. 7 to 4 bits/sample. This Basic Compressor operator will be used as a basic 
tool in developing operators with additional characteristics in Chapters III - V. 

As noted in Chapter I our primary means of identifying a large number of 
different code operators will be to subscript and superscript the symbol 4 1 . 

However , the first five operators 4^1 '1 ~ ^ will receive dual names to avoid con- 

fusion to those readers familiar with the original description. ^ The origi- 
nal names of r er an additional benefit of being easy to remember. 

Several coding examples are given at the end of this chapter, 

FUNDAMENTAL SEQUENCE 

Rather than seek a code which is optimum for some particular probability 
distribution, consider instead a code which is prooably the simplest to implement 
and determine the range of P over which it provides "good 11 performance. 

Define the code word operator fs[ *] by 


1 zeroes 

f S [i] = 000 TTTTTooo 1 


( 2 - 1 ) 


where i is an input symbol. The length of codeword f s [i] is 


(h = //^fs[i]j = i+1 bits 


( 2 - 2 ) 


The coding of J preprocessed sample sequence X - Xj . . . Xj using 
fs[ *] is given as 


4^[x] = FS[X] = fsfxj * fs[x 2 ] #... * fs[Xj] (2-3) 
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and will be called a fundamental sequence. We have thus defined our first code 


operator for sequences by - ] = FS[-]. 

The length of a fundamental sequence is 

J J 

F = WfS[X]) =y\(fs[x.]) = J +yv (2-4) 

j=l ' j=l 

or simply the block size plus the sum of the input samples. Note that when X 
is the all zero sequence FS[x] represents X with J bits. 

Observations 

Because of the assumed probability ordering of input symbols in (1-8) and 
the codeword lengths in (2-2), shorter codewords will be used more often than 
longer ones. As noted in Chapter I, this condition can be well approximated for 
a wide variety of real problems by suitable preprocessing. The latter is neces- 
sary to make the most of a given variable length code such as (2-1), 

A useful practical characteristic of the particular code in (2-1) is the fact 
that its definition does not depend on input alphabet size. This assures a wide 
applicability of the results to follow in later paragraphs. 

FS Performance 

A plot of the average per sample performance of code operator FS[ *] is 
shown in Fig. 2-1 as a function of measured data entropy B(P) over K samples 
where K » J. 

The graph was derived from the results of preprocessing many forms of 
data such that condition (1-8) was well approximated. The tact that many dis- 
tributions can yield the same entropy under these loose conditions is not of any 
practical significance. The graph is not intended to be that precise. The main 
point is that FS[ •] performance tends to remain close to data entropy H(P) in 
the range of rough!*/ 1.5 to 3.0 bits/ sample . If H(P) were always restricted 
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to this range then, considering the simplicity of fs['*]> there would be little 
point in doing anything else. This is not always the case, however. 

FOUR CODE OPTIONS 

To extend good performance outside of this entropy range we could seek 
to find other codes which performed well for higher and lower entropies. But 
in addition we want such code operators to have the same versatility and simpli- 
city as FS1 •]. In particular we wish these operators to be applicable to any 
practical alphabet size, q, without substantially increasing complexity. Before 
defining these alternative code operators we need some additional definitions. 
Additional Definitions^ 

Sequence extension . Let Y = • • Yj be any J sample sequence. Given 

the positive integer e > 1 an extended sequence is formed by terminating Y with 
enough dummy zeroes to make the resulting sequence length a multiple of e 
^i. e, , e|“-j jas in 


j? 


(2-5) 


th 


dummies 


The e extension of sequence Y is then obtained by simply grouping the 
consecutive J' = e- tuples of Y e such as 


Y' - Zj z 2 • • • Zj, = Ext e [Y ] 


(V; y 2 ••• y e ) *<y e+1 y et2 •••) 


(Yj_ 2 yj oo . . . 0) 


(2-6) 


fa] means the smallest integer greater than or equal to a. 
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J 

If each sample of Y could take on any of q values then each sample 
(except possib’y the last) of Y' could take on any of q e values. For our pur- 
poses we will use the reversible sequence operator Ext C [ •] when Y is binary 
so that samples of Y' can take on 2 e values. 

As an example, let Y be the 34 sample binary sequence 

Y = 1111100001111000011000000000111011 (2-7) 

then the 3 rd extension of Y is given as 

Y' = Ext 3 [Y] = (111) * (110) * (000) * (111) * (100) 

* ( 001 ) * ( 100 ) * ( 000 ) * ( 000 ) * ( 011 ) * ( 101 ) * ( 100 ) ( 2 - 8 ) 

where we have added two dummy zeroes to complete the|"-^p~j = 12— sample of 

Y«. 

Complementation, Given any binary sequence we let 

COMP[‘] (2-9) 

denote the operation of complementing each bit of the sequence (i. e, , ones 
complement). 

Coding FS[x] 

Instead of seeking to code X directly in a standard way we instead attempt 
to remove statistical redundancy that may be present in a fundamental sequence. 
First let 

FS [x] =3 COMP [FS[X]] (2-10) 
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be the result of compK nenting each bit of fundamental sequence F5[x] and 
call it "FS bar. " Now define 

a = Ext 3 [FS[X]J = a t a 2 • • * (2-11) 

b = Ext 3 [ FSfX] ] = bj b 2 ••• (2-12) 

where a is the |"y"j sample 3 rd extension of a fundamental sequence, and b is 
similarly the 3rd extension of its complement (see example in 2-7 and 2-8). By 
these operations we have simply lumped the fundamental sequence, and its bit 
by bit complement, into 3-tuples and added enough dummy zeroes (none, one or 
two) to complete the last 3-tuple. 

8-word code . A simple 3 -word variable length code i .efined in 
Table 2-1. 


Table 2-1, 8-word code, cf s [ • ] 


Input 

3-tuple 

a 

Output 

Codeword 

cfs[aj 

0 0 0 

0 

0 0 1 

1 0 0 

0 1 0 

1 0 1 

1 0 0 

1 1 0 

0 1 1 

1110 0 

1 0 1 

1110 1 

1 .1 0 

1 1 1 l 0 

1 1 1 

11111 


r 




If we view the bits making up a fundamental sequence as approximately 
binary memoryless then 3-tuples with more zeroes will tend to occur more 
often when zeroes are more likely than ones (active data). Under these condi- 
tions the assignment of codeword cfs[^J to binary 3-tuple a in Table 2-1 assures 
that shorter codewords will be used more often than longer ones. Sequence a 
in (2-11) is ready for a direct application of cfs[ •]. 

When binary ones are more likely than zeroes (inactive data) the situation 
is reversed; 3-tuples with more ones are more likely. In this case the assign- 
ment of shorter codewords in Table 2-1 to the more likely 3-tuples can be 
accomplished by flipping the right-hand column over or by simply complement- 
ing all input 3-tupies . Then sequence b in (2-12) represents a preprocessed 
fundamental sequence uhich is ready for a direct application of cfsf-]. 

We are now ready to define two additional code operators called n code fs n 
and ’’code fs bar”. These are defined as follows 

4> 2 [X] = CFS[”] = cfsfaj] * cfs[a 2 ] * ... (2-13) 

and 

4» 0 [x] = CFSfx] = cfsfbj] *cfs[b 2 ] * ... (2-14) 

where the and Ik are defined in (2-11) and (2-12). 

As will be seen shortly, the fact that a binary memoryless model is not a 
perfect match for a fundamental sequence is of no practical significance. 

Block diagram . Block diagrams describing code operators CFS[*]and 
CFS(*) are shown in Fig. 2-2, We maintain the dual ..ctrHon of ^qI •] and ' J 
for later use. 
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Fig. 2-2. Block diagrams for CFS[*] and CFS[*J 









V 


All zero sequence . In the special case when X is the J sample all zero 


sequence we have 


'/’(cFS[xj) = 5fJ/3] bits 


(2-15) 


and 


'/ 7 (CFS[X]) = fj/3] bits 


(2-16) 


Recall that these numbers compare with J bits required by operator 

FSM. 

Unity Code Operato r 

We can trivially add a fourth code option to the possibilities by defining 

+ 3 [X] (2-17) 


as any fixed length binary representation of X, In the simplest case we can 
take * [X] as X itself. 


+ 3 [X] = X 


(2-18) 


However, recall that in many applications the X sequence we are coding is the 
result of reversible preprocessing operations («,ee Fig. 1-1). The function of 
such operations is to remove correlation and by relabeling, produce a symbol 
stream for which the desired probability ordering in (1-8) is well approximated. 
However, these operations may also effectively increase the alphabet size. 

For example, taking differences between adjacent samples increases the number 
of possible sample values by two. Thus a direct fixed length binary representa- 
tion of a preprocessed X sequence may actually require more bits than a direct 
fixed length representation before preprocessing. In this case it would be more 
advantageous to interpret ^[x] as a fixed length binary representation before 


h 

Ir- 


reversible preprocessing. 
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Keeping in mind this more general interpretation of 4u[ •] we will for the 
most part assume tne special case in (2-18). 

Average Performance 

A block diagram showing the four code operators 4^1 *] discussed thus far 

are shown in Fig. 2-3. Measured average performance for these operators is 

shown in Fig. 2-4. The graph for FS[*J = has been transferred from Fig. 

2-1. Again, performance has been measured over spans of K samples much 

greater than the J sample length of X. Other than making q or J too small to be 

meaningful, the values of these parameters has little influence on the location 

5 

of these curves. However, an input alphabet size of q = 2 was chosen to show 
the fixed position of the V 3 [x] = X curve. 

The three curves for CFS[ • 1, FS[ • ] and CFS( * ] are almost a perfect match 
in the sense that when one starts performing poorly (aw ay from the 45° entropy 
line) another starts performing well. This should not be surprising since 
CFS[-] and CFS[*] obtain improvements over FS[ * ] by coding redundancy left 
in 3-tuples of FS[x], If FS[-)is performing close to H(P) then there can be 
little redundancy left. Otherwise CFS[- ] or CFS( • ] v^ould be performing under 
the entropy. For individual fixed coding algorithms this is impossible. 

The main observation is that these three code operators offer options 
which can provide efficient average performance when data entropies lie, 
roughly, in the range of 0. 7 bits/sample to 4 bits/ sample. Operator definitions 
do not depend on alphabet size, q. The additional unity operator, 4^1 * 1, is 
available free of charge in most digital systems. Thus we should be able to 
provide a simple yet efficient adaptive coder (universal coder) by selecting 
between these four options. 
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ADAPTIVE CODER 

We are now ready to define an adaptive coder which selects a code option 
from the four choices ") = CFS[-], 4^1 ' 1 = FS[*1, = CFS[-]and 

^[* J. Letting ID be the selected code operator for a given J sample input 
block a ’’Basic Compressor" output takes the form 

+ 4 [x] = BC[X]= ID * + |D [x] (2-19) 

/ 

where the concatenated ID is assumed to be a 2-bit binary number whereas, as 

* 

a subscript to it takes on the values 0, 1, 2 or 3. 

Observe that ID is really a function of X which partitions the space of all 
input sequences into four decision regions and takes on the corresponding values 
0, 1, 2 or 3. (Carrying the full ID[x] would obviously cause notational problems). 
It remains to specify this decision rule to complete the definition of 
^ 4 [-j = BCM in (2-19). 

Optimum Decision 

The most straightforward, and in fact optimum, selection procedure is to 
simply choose the code operator output sequence which is the shortest. While 
this might be considered brute force, the simplicity of the code options makes 
this approach quite feasible in many applications. We will provide a simpler 
procedure later. The optimum decision rule can be stated simply as 
Choose ID such that 

= min |4/*i) j (2 - 2o) 
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The Basic Compressor code operator would then require 


//( BC[x] )/j = 2/J + y(^, D [x])/J (2-21) 

bits/sample to code °n input data block X, where the second part is assumed 
to be the smallest of four possibilities. 

The overhead cost in identifying the code choice is 

2/J bits/sample (2-22) 

which could obviously be diminished to insignificance by increasing J. How- 
ever, other considerations guide the choice of block size J, 

A measured average of the second term in (2-21) will tend to decrease 
as J is decreased because of the ability to switch codes (adapt) more frequently. 
This effect will more than compensate for increasing overhead until eventually 
overhead dominates. Thus there should be a best block size. Runs on various 
forms of data by the author and Spencer and May^^ suggests that this best 
block size lies in the range of 16 to 25.^ The main observation is that J is not 
a critical performance parameter and can be chosen primarily for implementa- 
tion considerations. We will emphasize a block size of 16 for these reasons. 
Later we will have reason to consider variable block sizes. 

Average Performance 

A graph of the measured average performance of Basic Compressor 
operator 4^[ • ] = BCM of (2-19) using either the optimum decision criterion in 
(2-20) or a simplified rule to be introduced later is shown in Fig. 2-5. As in 
earlier graphs H(P) is the average measured data entropy over a span of K 


t 


Test cases originated from image data. 


24 






input samples, where K » J. A range of possible results is shown within 
the crosshatched area, depending on the variability of data statistics over the 
measurement span of K samples. When distributions are stable then average 
performance slightly above H(P) is typical (e. g. , 0.25 bits/ sample) throughout 
the range of 0. 7 ^ H(P) ^ 4. However, when data distributions are quite vari- 
able over the K samples then performance considerably under H(P) is possible 
(we are assuming that probability ordering in (1-8) remains well approximated 
as distributions vary). 

Comments . Note again that J is not a critical parameter. The rough 
performance description in Fig. 2-5 would generally be unaffected provided J 
is not so small that overhead becomes dominant or so large that the advantage 
of adapting disappears. If alphabet size q is very small then a performance 
description up to 4 bits/ sample is not meaningful since a fixed length binary 
representation, can do better. However, Hilbert^^has obtained good 

results using the Basic Compressor on processed data which has small q 
(e.g. , q = 3, 4, 5, • • • ). 

The main point should not be missed: the Basic Compressor can be 
expected to give efficient performance over a wide range of data entropies 
with no prior knowledge of distribution P (ordering in (1-8) excluded). In 
later chapters we will extend efficient performance first to higher data 
entropies and subsequently to very low entropies near zero. In these cases 
and others the code operators that result are essentially generated by pre- 
processing data into forms which can make effective use of the Basic 
Compressor. 
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USEFUL BOUNDS AND ESTIMATES 1 

There are some situations where it is desirable to minimize the 
computation and memory requirements needed to make Basic Compressor deci- 
sions and to estimate performance. The relationships between ^q[*] ? • ] and 

may be used to accomplish this yielding the functions of input X 

It 


Vq, Vj, Y 2 and Y^ where 

V 0 ( X)^ |f] + 2(F- J'> 

Yj(X)s F =^(4' 1 (X]) (See Eq. 2-4) 


Y 2 ( * )5 ff] + 2J2 ^(^ 2 [X]) 


(2-23) 

(2-24) 

(2-25) 


and 


Y^(X)s constant 


(2-26) 


These are all simple functions of F which is itself just the sum of data samples 
plus the Basic Compressor block length (2-4). 


^ [V|means the smallest integer greater than or equal to a. 

t txhe functions Yj and Y-j are trivial and Yq and ^ are easily derived. Sup- 
pose FS is a string of all zeroes, then by Table 2-1 each 3-tuple is coded 
into one bit so that SB Kra) = Now start changing zeroes of FS into 

ones. Each change will increment SB ^q[x]| by at most 2 bits. The increase 
will be less than this only if an all ones 3-tuple is created. Since |^"| is the 
number of 3-tuples and F-J the number of ones in FS we get SB^\> 0 [x])^f] 
+ 2 (F-J). Using the same argument on FS, using J in place of (F-J), yields 
Y» in (2-25). 
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The relationship of the Y^X) is shown more clearly by the straightline 
approximation in Fig. 2-6. The lower envelope of these curves, shown with 
heavy lines, is essentially the simplified decision rule we are seeking. 

Si mplified Decision Rule 

Instead of comparing the actual bit counts for the four options as ir (2-20) 
we could instead use the following rule: Since the Yj( • ) are really functions of 

F we choose ID such that 

Y |0 (F) = min |v.(X)| (2-27) 

J J 

This rule simplifies further to the rule in Table 2-2 where we assume 
that fixed length coding of X by 'P^( • ] requires m bits, and m > 3J. 

The expression 3(m-2J) is the approximate result of solving Y£(X) * n 
(2-25) for the F which gives Y^( X ) 2 m. 

Using these rules the Basic Compressor operator, ^(*]in (2-19), could 
require no more than 2 + y (F) bits to code X. But then certainly a coder 

ID 

which used the optimum criteria in (2-20) could require no more than y (F) 

ID 

bits either. Thus y^(F) may be used to bound the performance of either 
system. In particular, we have 

^(^ 4 [x])= ^(BC[X])< Y 4 (X)e 2fY |D (F) (2-28) 

We can now see that Y| D (F) is the lower envelope to the curves in Fig. 2-5. 

3 

Note that there are two critical points on the graph. Below F - 
'PJ •] = CFS[ • ] will always perform better than FSl •) = + •) while above F = 3J 

(and less than 3(m-2J)) ‘1 - CFS[ •] will always perform better than FS[ • ] . In 

these cases the simplified decision criterion in (2-29) would make the same decisions 
as the optimum criterion in (2-19). Between these two operating points there 
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Table 2-2. 


t 

Another Form of Simplified Decision Rule, 1 


Operator 

Decision 

Condition on 
Fundamental Sequence 
Length 


F< 3[j/2j 

*] 

3 [J/2J < F < 3J 

* 2 M 

3J < F < 3(m-2J) 

+ 3 h 

F > 3(m-2J) 


is some possibility that either 4^1 *1 or t [ •] might perform better on a given 
X than operator FS [•] = [•] which would be chosen by the simplified rule. But 

experience on real data has shown that the vast majority of the time FSl •] is 
indeed the best choice. There is also a remote possibility that 4M *] could 
perform better than *1 when F > 3(m-2J). Again, experience indicates 
that i •] is usually the best choice. Thus the difference in performance 
between a Basic Compressor* using the optimal decision rule and ore which 
uses the simplified rule seems to be insignificant from a statistical point of 
view. 

V (F) as an Estimate 
— id s — 1 

The measured performance of the Basic Compressor operator 4^1 *1 
ahown in Fig. 2-5 is a plot of the long term average of Eq. (2-21). We have 
just noted that the choice of decision rule would have a negligible impact on 
these results. In addition we have the following useful observation that a long 
term average of 


2 V,o(F) 


(2-29) 


t [ojmeans the greatest integer less than or equal to a . 
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will tend to yield very nearly thf same results. That is in a statistical sense 
we have, for either decision rulet 

E {r(^[x])|< 2 + E J Y|o(F) | (2-30) 

and 

E {yf 4 [x])} = 2 + E { Y 1D (F) j (2-31) 

Thus the bound in (2-28) is statistically tight. 

This is an extremely useful result. In simple ter ns it means that Basic 
Compressor performance can be estimated (and boundeu) quite closely by add- 
ing up irout samples (Eq. 2-4) and then computing y ^( F) using the simple 
expressions in (2-23) through (2-26). The actual coder need not be implemented 
to determine performance. 

Looser Bounds 

Note from Fig. 2-6 that the function y (F) is convex n. Thus we have 

ID 

e(v |D (F)| <Y | 0 (E) (2-32) 

wher e 

F - e(f) (2-33) 

Direct substitution in (2-30) yields a looser bound. Whereas E |\^(F)^ takes 
into account the ability to switch codes each J samples, E I’Y^F)} assumes 
only the ability to pick the best of four codes once. 



denotes expectation. 
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A practical application of (2- >2) to bounding the performance of the 


Basic Compressor over a long sequence Y = - • - y^. where K ^ J, is quite 

straightforward. Determine F as^ 


F 



and then bound average per sample performance by 



Y (F) 
ip 

J 


( 2 - 34 ) 


(2- >5) 


Ya riable J. In all cases so far we have assumed that block size J was 
fixed. However, we will later have reason to consider variable block sizes. 
Here we derive a useful result for later use. First supplement the function 
Y (F) with J as in v (F. J). If we fix I-' at 1*' and plot v ( F , .1 ) as a function 

ID 1 I D ID 

oi J we would find it is also convex n. Then we have 

l: {v, D ( 1 "" I 1 = F |- Y „,<' ^ (-• 

where 


J = 




^7) 


But then if both F and J vary 

L|Y io (F..7>| - 

* t-:{v, 0 (F.J)} 


(2- *8) 

i2-<9) 


f 


K is assumed to be a multiple 


of J. 



We recognize (2-38) as the previous result in (2-32). Thus (2-39) 
provides a slightly weaker bound to Basic Compressor performance. 

BLOCK DIAGRAM 

A block diagram of the Basic Compressor using the simplified decision 
rule is shown in Fig. 2-7. 

EXAMPLES 
Example 1 

Let the input sequence block size be J=16 and the alphabet size q = 16. 
Then suppose 

Xj = 0,0 0,0, 0,4, 0,0, 0.4, 0,9 0,0, 1,0 (2-40) 

Fixed length, A standard binary representation of is easily 

obtained as a sequence of sixteen 4-bit codewords. Then, obviously 

r(0 3 [Xj )= 64 (2-41) 

Fundamental Sequence, A fundamental sequence is obtained 

using (2-1) and (2-3) yielding 

FS[>:J = ^[Xj] = 1,1,1,1,1,00001,1,1,1.00001,1,0000000001,1,1,01,1 

(2-42) 
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where we have separated individual codewords by commas, The length of this 
"FS" sequence can be obtained by counting or by using (2-4) 


J 

F = '/’(^ 1 [X 1 ])= J+ £ Xi = 34 (2-43) 

i=l 


FS[X is obtained from (2-42) by complementing each bit. Sequences 
a and b in (2-11) and (2-12) are obtained by grouping the bits of FS[Xj] and 
FS[Xj] into 3-tuples and adding enough dummy zeroes at the end to complete 
the last 3-tuple. 



= 111, 110, 000, 111, 100, 001, 100, 000, 000, 011, 101, 100 


(2-44) 


b 


Ext 3 [fsCXj]] 

ooo, ooi, in, ooo, oil, no, oil, in, ill, 100, 010, ooo 

(2-45) 


Now applying the code cfs[ *jin Table 2-1 to sequences a and b as in 
(2-13) and (2-14) respectively we get 

^[Xj] = C FS[X j] 

= inn, lino, o, mil, no, ioo, no, o, o, moo, liioi, no 

(2-46) 
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a nd 


^Xj] = CFSfXj] 


= 0 , 100 , 1 111 1 , 0 , 13 100 , 11110 , 1 1100 , 11111 , 11111 , 110 , 101. 0 

(2-4?) 


By adding hits in (4-4o) and (4-47) or by adding codeword lengths \\ lien using 
Table 2-1 we get 


(2-48) 


^ I^qL^ jJ I - 42 


Thus the optimum decision criterion of (2-20) would select 
CjlXj] = FSlXj] as the codec! output sequence to use. V’jjXj] "’ ou ^ be preceded 
by a two-bit identifier for a total of 3b bits. 

Using the simplified rule in (2-27) would have yielded the same results. 
Example 2 

As another example take J ■= 20, q - lb and 


X, - 1.0, 1, 1,5, 3,0, 1, 3,7, 1,2, 2,0, 1,2, 2,2,8, u 


(2-19) 


Then generating an FS and breaking it into 3-tuples as before we get 


a = ExFlFSUJ = 011,010,100,000,100,011.010, 


001, 000, 000, 010, 100, 100, no, 


100 , 100 , 100 , 100 , 000 , 000 , 100 , 


000, 010 


(2-50) 
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Again using Table 2-1 we could generate ^[x^] = CFS[X^] # Instead we put 
down the corresponding codeword leng'ths, .V^c£s[x.]j f *the sVim of A'hich equals 

'/’(cfs[x 2 ]) 


5,3,3, 1,3,5, 3, 3, 1,1. 3, 3, 3, 5. 3, 3, 3, 3, 1,1, 


(2-51) 


From (2-50) and (2-51) 

F = 7’(FS[X 2 ]) = 68 (2-52) 

and 

.'/’( CFS[X 2 ] ) = 63 (2-53) 

•'■W) > 80 and / ( u q[^ 2^) > so t ^ iat °P^ mum decision is 

Simplified test . The same decision results from the use of Table 2-2 

where 

60 < F < 120=^O 2 H (2-54) 

Estimate . Using V 2 ( X 2 ) in (2-25) we hav r e 

4> 2 (X 2 ) = fF/3"J + 2J = 23 + 40 = 63 (2-55) 


Using the result in (2-55) we see that the bound in (2-25) is achieved, 

4*[s 2 i) = v 2<x 2 >. 

Example 3 

Now take J=20 again but suppose X^ is a sequence of zeroes 


X 3 c 0, 0,0,0, 0,0,0, 0,0, 0,0,0, 0,0, 0,0, 0,0,0, 0 


(2-56) 
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Using (2-4), F=20 and 


V F > ■ V x 3> = f-rl ■ 7 i 2 ' 57 ) 

Actually going through the coding procedures for Wgl •] yields the seven bit 
binary sequence 

Y q [X 3 ] = 0000000 (2-58) 

Adding two bits for identification of the code option gives from (2-19) 

^ 4 [X 3 ] = Bc[x 3 ]= 00*0000000 (2-59) 

Example 4 

Now suppose we have the sequence 

Y = X 2 *X 3 (2-60) 

where X^ is the sequence of example 2 and X^ is the all zero sequence of 
example 3, 

Letting 4^1*] denote the operation of using the Basic Compressor on each 
separately we have 

4 5 [y] = * 4 [x 2 ] * ^ 4 [x 3 ] (2-61) 


Using the previous results we have 

» 

'/’(^ 5 [y])= 4 + 63 + 7 = 74 bits 


(2-62) 


Recall that either an exact bit count or use of the estimator y (• ) yields this 

ID 

result. 
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Now compute the average FS length from (Z-i>4) as 


F = j (88) = 44 


Then using F in (2-23) to (2-26), the minimum yields 


Y (F) = F 

ID 


Average performance of Y is bounded by (2-35) 


2 44 

20~ + 20 = 2* 3 bits/ sample 


This compares with actual average performance derived from (2-62) 


= 1*85 bits/sample 


The reader can check that he would obtain essentially the same result 
by coding Y directly using the Basic Compressor ^i. e. , ^[Y] The overhead 
term, 2/20, in (2-6 5) would be reduced to 2/40. The key observation is that 
using ^[*lon all of Y gives up the ability to adjust the coding to variations in 
data character, which in the case of and are quite extreme. Here, the 
advantages of adapting are much more significant than the slight increase in 
overhead. 

OTHER OPERATOR DEFINITIONS 
Operator ^ [ • ] 

Let Y be an N sample sequence of samples which is a priori partitioned 
into q smaller blocks, Y^, so that 


Y = Y. * Y- * ... * Y 

12 q 


4 


composed of J . samples each where 


J 1 



i=l 


( 2 - 68 ) 


Then we define the block by block Basic Compressor coding of Y by the 
operator ^g[*] where 

+ 5 ra = * ^ 4 [Y 2 ] ^[Y n ] (2-69) 

An example of 4^ [ * 1 was given in (2-61), 

By defining i{^[ *] we have lumped all the possible lengths of Y and all the 
possible ways of partitioning Y into the Y.. In most cases N and a specific 
partitioning would be fixed for a given application,, The most obvious and prac- 
tical situation is when N is a multiple of some fixed Basic Compressor block 
size J. 

Variable N. In Chapters IV and V we introduce an application for which 
N is a priori unknown for each Y sequence but is available at a decoder for 
decoding purposes (i. e. , it is transmitted separately). Then we define 

^ 6 H (2-70) 


as a code operator which codes Y by first choosing a preselected block parti- 
tioning assigned to N and then applies the corresponding form of to Y, 

Whereas there are many possible rules for partitioning, the following seems to 
be quite suitable and practical: 

Partition Y into q blocks, where 
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|N/jj for N > J 


n = (2-71) 

( 1 otherwise 

and the first T-l blocks have J samples and the last, N-(r) - 1)J samples. That is 

| J for i< r) 

J. = ^ (2-72) 

| N-(r| - 1 ) J otherwise 

Quite obviously, 4^1 *] is a special case of 4^1 *} for which the length of Y 
is predetermined and fixed. 

Performance Bounds and Estimates 

The function T^(* ) in (2 >28) provides a useful bound and estimate of the 
performance of Basic Compressor operator It is equally desirable, and 

a simple matter, to assign similar functions to more complex operators as 
they are developed. That is 

£ V.(Y) and//^.[Y]) * Yj (Y) (2-73) 

for each * ]♦ As an example and a result for future use we have from the 
developments above, for j = 5 or 6, 

n 

•^(^[Y]) s Yj (Y) = Y^V 4 (Y.) (2-74) 

i=l 

and can expect that typically 

//'(b.[Y]) « Y.(Y) (2-75) 
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III. EXTENDING PERFORMANCE TO HIGH DATA ENTROPIES 


Using the Basic Compressor to directly code long sequences drawn from 
average distributions P which have entropies greater than 4 bits/sample results 
in relatively inefficient performance as noted in Fig. 2-5. That is, the curve 
for average performance will move away from H(P). This chapter addresses 
that problem and provides a simple means for achieving efficient performance 
at the higher entropies. 

SPLIT SAMPLES 

Let M n be some sequence of N preprocessed samples for which the 
probability ordering of ( 1 - S) is satisfied. The symbol n signifies that the 
standard binary representation for M n requires n bits/sample. Define the 
"split sample" operator SS m [ •] by 

SS m [M n ] = M m * L k , m+k=n (3-1) . 


where 


r k = / N sample sequence consisting of the k least significant 

Ibits of each sample of M n ~ 


and 


M 


*N sample sequence consisting of the m = n-k most 
significant bits of each sample of M n 


(3-3) 


and SS n [M n ] = M n . 


Clearly SS m [* ] is reversible since each sample of M n can be reconstructed 
by combining the corresponding samples of M and L . We can therefore con- 
centrate on the efficient coding of JVl m and L k , from which Kl n can be retrieved. 

A sample block diagram showing SS m [ • ) is provided in Fig, 3-1. 
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Fig. 3-1. Split Sample Operator, SS m [*] 



ENTROPY CONSIDERATIONS 


Definitions 

Let be the average distribution of the N m-bit sarn’ xes from M m , 
Let (3(m) denote the per sample performance of the Basic Comj essoi 
operator applied to sequence M . That is 


(3(m) 



(3-4) 


where S. = 5 or 6 (see 2-71 and 2-72). 

Observations 

If the distribution P^ on the original input camples M n approximates the 
desired ordering in (1-8) then so does each on the m most significant bit 
samples. In addition as m decreases (fewer significant bits in eac^ sample) 
the distributions P m become more peaked around zero and the entropies H(P ) 
decrease. For our purposes here we note from experimental observation that 
if H(P m ) exceeds approximately 4 bits /sample 

H(P .) » H(P ) - 1 (3-5) 

m-1' m ' ' 

That is, one less bit of quantization reduces the sample entropy by approxi- 
mately one bit. This provides the key to obtaining efficient performance at 
higher entropies. 

Example 

Suppose H(P fi ) = 5. 5, then we know that a direct applicat'on of the Basic 
Compressor using ■: will not produce efficient performance since H(P^) "•4. 
That is, (3(n) is not close to H(P n ). 
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Applying ^[»] to M n ~* would still yield inefficient performance on the 
n-1 most significant bits since by (3-5) H(P^ j) - 4. 5 >4 (see Fig. 2-5). 

However. H(P^ ^) ~ 3. 5 < 4 so that we can expect that the Basic Com- 
pressor, via operator 4M -I, will yield 

P(n-2) - H(P n 2 ) (3-6) 


Then using (3-5) twice we have 


P(n-2) + 2 - H(P n ) (3-7) 

This suggests that we can obtain efficient coding of the original input data M n 
by coding M with the Basic Compressor and transmitting all the least signi- 
ficant bits, L^ f separately. 

SPI IT SAMPLE MODES 
Definitio n 

We define the set of operators •k ™! ■ 1 by 

4'"'[M n ] = ^ f [M m ] * L k , f =5 or 6 (3-8) 

where M m nnd L k are defined in (3-1) - (3-3), 

Block diagram . A block diagram of I is given in Fig. 3-2. 

Average Performance 

The per sample performance of operator i^Y 1 over the N samples is 
given as 


^v(m) 




N 


- = p(m) + k 


(3-9) 


Actual measurements of the n(m) are shown in Fig. 3-3 for n=8 bit input 
samples and N large. 
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ADAPTIVE OPERATOR. v.|--| 

O 

Wt- observe from Fig. 3-3 that at least one operator has average 
performance that remains close to the average entropy line ll(P^). Thus we 
need only select the proper one to use. Define operator ] by 



in 1 



(3-10) 


where m* is a selected valu*' of m (the concatenated m* being interpreted as a 
binary number). 

Observe that m* ( just like ID in (2-19) is really a function of M n which 

partitions the space of all input sequences into decision regions and takes on the 

possible values of m. It remains to specify this decision rule to complete tire 

definition of s> [ • |. 

o 

Optimum Decision 

l sing the optimum decision criterion (counting bits) we choose m* such 

that 


- n»»f;[u"]) (3-1 

m ' ' 

Simplified Rule 

Proceeding as we did in Chapter II we can bound and approximate the 
performance of each by using (*1-74) and (2-75). Wo have from (3-8) 

for { =5 or o 


v v"'(M n ) _ V f (M n ) I Nk 
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Fig. />. Split-Sample Modes, Individual Average Performance 
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where Y-(M m ) is a bound on the performance of operator h [•] applied to 

-n 1 f 

the m MSB's of M and Nk is a count of all the k LSB's of input sequence M n . 

By the observations in (2-33) we also have typically 

•/■(+” [Sl n ])« Y™(M n ) (3-13) 

We can now specify a simplified rule for determining the choice of in. 

Choose in* such that 


v m / Cf n \ • / w n \ 

(M ) = min Y^ (M ) 


(3-14) 


m 


Letting y*( m 1 ) denote the number of bits required to represent a decision 
(i. e. , m*) ive have 

y'(+ 8 [M n ])sY 8 (M n ) = y’(m') +y™'(M n ) (3-15) 

and where typically 

* Y 8 ( M n) (3-16) 

Block Diagram, 4^ [ ■ 1 

A block diagram of operator 4^M using the simplified decision rule of 
(3-14) is shown in Fig. 3-3. 

Block Si z e of M n 

We assumed that the length of sequence M n was large to obtain the statisti- 
cal performance results for individual split sample modes in Fig. 3-3. However, 


^ Note that the length of a fundamental sequence generated by using m MSB's can 
be related to an FS length generated using fewer MSB's by the sum of the addi- 
tional split LSB' s. 
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there is no fundamental reason that N be large. In fact, the tradeoff is much 
like that required in choosing a good Basic Compressor block size in 
Chapter II. The smaller N is, the more rapidly operator ipg[ •] can change to 
accommodate variations in data statistics. At the same time per sample cost 
in overhead to identify the selected split sample mode, '/'(m 1 ), increases. 

For most typical applications the choice of N is not critical and one simply 
chooses N to be some convenient value which is large enough to make./fm 1 ) 
negligible. For example if there are four split- sample modes included as 
options and the data character is slowly varying then a convenient block size 
of N = 64 (e, g. , four Basic Compressor blocks using J = 16) only adds the 
negligible overhead of 1/32 bits /sample. In these situations the average per 
sample performance of operator i+'gl'l, using the optimum or simplified deci- 
sion rule, will generally ride the lower envelope of the curves in Fig. 3-3 
(where the computation of average performance and entropy is assumed to be 
determined over some N 1 »N), 

When the variation of data statistics is more rapid, the average per 
sample performance of ^g[ • ] may be less than the average entropy. Ir special 
cases it may be advantageous to choose N as small as the length of one Basic 
Compressor block J (e. g. , 16). That is, the additional adaptivity may more 
than compensate for the increase in overhead! In fact additional simplifications 
turn up once N is constrained to equal J. 


^ Note that it is a simple matter to investigate the effects of changing parameters 
(N,J) using estimators Yg(‘h YyM, etc. Complete simulations are replaced 
primarily by counting operations. 
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Operator ^g[* I. With N=J we are led to simplify ‘I'gt • ] to the following 


1 SS * ‘4^[M n ] if ss = 0 

(3-17) 

ss * m' * ^ 2 [M m '] * L k ' if ss = 1 

The binary term ss indicates whether the input data is to be split or not. 

If ss=0 the unmodified input data is coded di ctly using the Basic Compressor 
operator ], whereas ss-1 designates a split with m* indicating what the 
split is as for ^g[*]. However, unlike ^g[*], the split MSB samples making up 
M ni are coded using code operator i^l*] = CFS[*] only, rather than the four 
options of 6^(-] . This eliminates the need for the two ID bits associated with 
. It also simplifies the decision rules since there is only one code option 
for each split mode. 

Based on observed performance using high entropy image data, g[ •] and 
\jjg[ • ] perform essentially the same with N = J = 16. This is because a . ] deci- 
sion is almost always to use when entropies are high, A slight improve- 

ment in average performance can be obtained by including * ] = FS[ * ] as an 
additional option when ss= 1 in (3-17). 

Further Simplification Notes 

In some applications decision making can be further simplified with little 
loss in performance. For example, the correct split- sample modes can be 
accurately predicted from the results of coding previous blocks in the applica- 
tion described in Refs. 2 and 3. In other cases the computation associated with 
determining fundamental sequence lengths for each M m (adding the split MSB and 
LSB samples) can be reduced. This result is obtained by noting that when 
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entropies are high, a fundamental sequence length associated with the m MSB's 
of a sequence will (typically) be roughly half that for the m-fl MSB's. 

Moving the Preprocessing Operations 

All the discuss : ons thus far in this chapter have assumed that input data 
has been previously preprocessed as described in Fig, 1-1. In some applica- 
tions the split- sample operations can be performed before this preprocessing 
with precisely the same average effects. An important example includes the 
differencing of adjacent image samples as described in Refs. 2, 3 and 10. 
Although the same performance can be expected, it is more difficult to provide 
a parallel implementation of the code estimators when there are several split 
sample modes. 

EXAMPLE 

The following example should help make the previous discussions seem 
less abstract. Let 

M n = yJ * Y” * y" (3-18) 

be an N = 52 sample preprocessed input sequence partitioned into three (Basic 
Compressor) blocks of = 16, = 16 and = 20 samples respectively where 

yJ h 3,5,1,0,2,1,2,2,7,10,10,22,14,7,0,14 

- 22,22,3,0,1,5,3,0,21,17,5,4,5,1,7,13 

Y" s 11,2,5,7,2, 1, 17,3,6,6,2, 1, 1,5,6,6,6,22, 16,0 

(3-19) 
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We assume that the data originates from a source with alphabet size q = 32 so 
that n = 5 and we wish to code M using operator '1, assuming two split- 
samp] e modes with m=n = 5 and m = 4. 

Simplified Decision 

Referring to Fig* 3-3, we first need to determine which split-sample 

5 4 

mode to use by comparing the estimates y ^ and y^ t 

rn=_5 # With m = 5 we have from (3-12) (M 5 ) = y 5 (M 5 ) which by (2-74) 

is simply the sum of the Basic Compressor estimates on each of the input 
blocks, y^(Y?) # Performing the required computations using the methods of 
Chapter II we get 

Yy (M n ) = 73 + 82 + 91 = 246 (3-20) 

~ 5 

m = 4, Representing the samples of M as five bit numbers and applying 
split-sample operator SS^[ * ] (described in (3 -l)-( 3-3 )) we obtain, with an 
obvious extension of notation 

~ 4 _ 4 ^ 4 ~ 4 

M = * M* * M* , L A = Lj * L A * L A (3-21) 

where 

= 1,2,0,0,1,0,1,1,3,5,5,11,7,3,0,7 

M t = 11,11,1,0,0,2,1,0.10,8,2,2,2,0,3,6 
c* 

M 3 = 5,1,2,3,1,0,8,1,3,3,1,0,0,2,3,3,3,11,8,0 

(3-22) 
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and the least significant bits are 


l} = 1, 1, 1,0,0, 1.0, 0,1, 0,0, 0,0, 1,0,0 
= 0,0, 1,0,1, 1,1, 0,1,1, 1,0,1, 1,1, I 
= 1,0, 1, 1,0, 1, 1, 1,0,0, 0, 1, 1, 1,0, 0,0, 0,0,0 


(3-23) 


Then 


Y 5 (M 4 ) = 55 + 59 + 68 = 182 

using Basic Compressor operator 4^1 *1 = CFS[-]on each M , Then by (3-12) 

Y 4 (M 5 ) = \ 5 (M 4 ) + 52(1) = 234 (3-24) 

Decision . The decision rule in Fig. 3-3 would lead us to select m = 4 
4 5 \ S r ^ > 5 

since Y^(M )< Y^(M ). Since there are only two possible split sample modes 
we have from (3-24) and (3-15) 


& (^qCm 5 ]) < 235 

Actual Coding 

The actual coding of M takes the form 


(3-25) 


4 g [M n ] = m« * +“’[M n ] 

= m* * 4 5 [M m '] * L k (3-26) 
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By previous results we have decided to use m 1 =4 which can be arbitrarily 
represented with a binary 1. Then (3-26) can be expanded to 

+ g [M n ] = 1 * 4* 4 [mJ]* ^[m*] * L 1 (3-27) 

~4 

Noting that the individual Basic Compressor decisions leading to Y^(M ) were 
i^[ • ] = CFS[ *] for each block , (3-27) further breaks down to 

Y g [M n ] = 1 * 10 * ip 2 [Mj] * 10 * * 10 * * L 1 (3-28) 

t 

The reader may verify that 

o 2 [m^] = 10111100111011110110110001000100 

000100001011001100100 (3-29) 

+ [M*] = 00010000010011100110111011100010 

0001001001001001101010100 (3-30) 

.> [M*] = 01001011011001110000100101100011 
10111 11 ni 01 10001 101010001 01 0011 100 


(3-31) 


and that equality is obtained in (3-25) 

^(4> 8 [M n ])= 235 


(3-32) 


t J 

Note that successive use of operator ^g( •) in (3-17) would require fewer bits 

for this example. 
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~ K. 

Observation. Note that the actual location of the LSB sequences Lj 
need not be as prescribed in the development of ^g[ • ]• For example, they 
could all be located before the Basic Compressor blocks, or individually 
after each Basic Compressor block they correspond to. Such variations to 
Og[ •] have no effect on performance but may be a useful implementation 
consideration. 
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IV. EXTENDING EFFICIENT PERFORMANCE TO VERY LOW ENTROPIES 

FOR NON- BINARY SOURCES 


As noted from Fig. 2-5 if the variations in data entropy result in values 
much below 1 bit/ sample the efficiency of the Basic Compressor operator 

suffers. This chapter develops a code operator structure for extending 
good performance over this lower range of H(P) when of non-binary sources 
are a priori unknown except for the usual preprocessing assumptions in 
Chapter I (see Fig. 1-1). A basic requirement in the definition of this structure 
is the existence of a separate code operator which is capable of providing effi- 
cient coding of binary memoryless sources with a priori unknown (and varying) 
probabilities. Since the latter subject is of general interest by itself it i- given 
a separate treatment in Chapter V which provides the development of a class of 
such binary code operators. Appropriate substitution of these results into the 
operator structure of Chapter IV completes the definition of a class of non- 
binary code operators which will maintain efficient performance when H(P) is 
very low. 

REWRITING THE ENTROPY EQUATION 
Data Model 

We can obtain motivation for developing an adaptive code operatir by first 
investigating an idealized data model. Let 

Z = Zj z 2 •••• z T (4-1) 

be a T sample sequence from a discrete memoryless source with known and 
fixed probability distribution P with the usual symbols 0, 1, 2, * • • q-1 and the 
probability ordering of (1-8) (i, e. , the idealized output of preprocessing opera- 
tions in Fig. 1-1). 
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R.elati nship with P 


As in earlier chapters, P denotes a measured average distribution of 
samples. If these samples are from the ideal memoryless source with distri- 
bution P then P will equal P if we make the measurement span long enough. 
SpiiUing the Source 

When H(P)=H(P) drops much below 1 bit/ sample the sample distributions 
start taking the form shown in Fig. 4-1* 


a. 



SAMPLE ' C S, i 


Fig. 4-1. Sample Distributions When H(P) — 0 
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The important characteristic to notice is that the probability spike at zero 
seems out of place from the remainder of the distribution. The coding philosophy 
we have followed so far suggests that it may be advantageous to treat these dis- 
tinctly different parts of the distribution separately. 

Indeed the source characterized by distributions in Fig. 4-1 can be viewed 
as a mixture of two sources, one with the symbol zero and the other with the 
remaining symbols 1, 2, ••• q-1. More simply we can get the desired motiva- 
tion by manipulating the basic entropy equation in (1-9) rewritten here as 

H(P) = -P 0 1o 82 PO •(TT7j2-i P i l0g 2 P i (4 “ 2) 

C i*0 

After some manipulation this yields 

H(P) = H p (p 0 ) + (1 -p Q ) H 0 (4-3) 

where 

H p (P 0 ) = -P 0 lo6 2 p 0 -(l-p 0 )log 2 (l-p 0 ) (4-4) 

and 

H e = E(t^) Ios 2 (l\) <4 - 5) 

i*0 u u 

We recognize the first term, Hp(p^) as the entropy of a binary memory- 
less source with the probability of a zero equal to p^. More specifically, let 

D (4-6) 


i>0 



T be the sample binary sequence which identifies whether a symbol of Z is a 
zero or not. Then Hp(p^) can be viewed as the minimum average bits/ sample 
required to code D. 

Since 1 = Z3 / ( 1 “Pq ) . is the entropy of a discrete memoryless source 
with symbols i=l,2, ••• q-1 and probabilities p^/ ( 1 “Pq)» The latter terms are 
simply the conditional probabilities 


Pr [symbol = i | i^O ] = p./(l-p 0 ) (4-7) 

of the original source and which, with the exception of the missing zero, also 
satisfy (1-8) if the p^ do. Then Hq can be viewed as the minimum average 
bits/sample required to code all the non-zero samples of Z (where T large) 
and (1-Pq) represents the fraction of all the original symbols which would 
typically be included in this sequence. 

More specifically let^ 


be the sequence of all the non-zero samples of Z. Then Hq can be viewed as 
the minimum average bits/sample required to code 0 and ( 1 -Pq ) represents the 
fraction of all the samples of Z included in 0. That is 

E{v (e)} = (1 -P 0 )T (4-9) 

Then by (4-3), the coding of long sequences from the original source close 
to H(P) can be achieved by n splitHng M the sequence into two new sequences D 


^Vc will later relabel the 0 symbols to 0, 1,2, ••• q-2. Tins has no effect on 
the entropy arguments. 
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and 9, as described above and then coding each of them close to their 
corresponding entropies H^p^) and Hq . 

The remainder of this chapter and Chapter V will seek to develop adaptive 
code operators which take advantage of these concepts while recognizing that 
real world problems are not quite so idealized. While the memoryless assump- 
tion and prr’. ability ordering can usually be approximated well in practice the 
real P Is generally a priori unknown and varying. Measured distributions, P 
and Pq may even average out variations in data characteristics. In these practi- 
cal situations, as in previous chapters, the entropy expressions in (4-3) - (4-5) 
serve as practical guides to performance but may no longer be bounds. 
OPERATOR SPLIT [•] 

Using the above discussions as a guide we define the reversible SPLITt*] 
operator by 

SPLIT[z]= D*© (4-10) 

where Z is the T sample sequence in (4-1), D is a T sample binary sequence 

"identifying" which samples of Z are non-zero and, 9 is a sequence of all the 

~ | 

non-zero samples of Z reduced by 1, Specifically 

D = d l d 2 d- r (4-11) 

where 

I 0 if z. = o 

d. = (4-12) 

J 1 otherwise 


t 


This amounts to a relabeling of the original 0 in (4-9). 
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and the 


N 



(4-13) 


samples of 

0 = e ! e 2 0 3 ' ' ’ e N (4-14) 

can be generated by testing successive samples of Z and creating new samples 
of 0 from Z by the rule 

Create next 

If z. > 0 sample of 0 (4-15) 

0 = z - - 1 
J 

Observe that by subtracting 1 we have essentially done the relabeling 
described in Fig. 1-1. By (4-7) the relabeled 0 symbols 0, 1, 2, • • • q-2 
will satisfy the desired probability ordering in (1-8) if the input do. 

Reconstructing Z from D and 0 is obtained simply by testing successive 
samples of D and generating the corresponding samples of Z from © by the rule: 


If d. = 0 

l 


z. 

i 


= 0 


z i =0 + 1 

If dj = 1 where 0 is next 
sample of § 


(4-16) 
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As noted in earlier discussions, operator 4>g[-l will be an efficient 

operator for Z sequences ifC» J , [•] and ^ [•] are efficient operators for D and 

P 6 

0 sequences respectively. Since distributions of Z are in practice a priori 
unknown and varying, so too are the corresponding distributions of samples of 
D and0. For 1 to t> e an effective operator 1 and 4 1 [• 1 must be able to 
adapt to variations in the statistical character of D and 0. 

Coding of 0, 4^1 •] 

Of course any algorithm for coding 0 could be substituted for o [.]. 

0 

For our purposes we will assume an operator form utilizing the Basic Com- 
pressor, such as Ug[ # ]in (2-70) or some equivalent variable length form of 
ugt’lor ^gf * ] utilizing split saznpie modes (see Fig. 3-3). The specific algorithm 
details would of course depend on the particular application. However, based on 
the results of previous chapters an appropriate choice of such operators should 

provide a broad range of efficient performance as entropy H in (4-5) varies. 

u 

Estimate of Since we have specified that Vgl*] be an operator 

form utilizing the Basic Compressor, estimates of performance of the type in 

(2-75) are easily obtained. Following earlier notation, we let y (0) represent 

0 

this estimate. 

Coding of L 1 , [ • ] 

Again, any algorithm for coding the binary D sequences could be substi- 
tuted for in (4-17). We will provide an efficient class of such binary code 

operators in Chapter V. 

V 

As we did for [ • 1 , we let yi(D) represent an 
0 P 


Estimate of//’ 


estimate of'/ i*m) of the form in (2-73). Then as in Fig. 4-2 we also have 


the equivalent form for 0 o [ • 1 

7 


Y 9 (7.) = Y£ (D) i- Y 0 (O) 


(4-18) 
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Operator y Q [ - ] 

The design of operator 4J q [ • ] is motivated by the distinct spike at zero in 
symbol probability distributions when H(P)— 0. However, by (4-3) 
operator 0^ [ - ] will be efficient for any H(P) provided both operator s 4>p[ • ] and 
Vgt-lare. Unfortunately requiring y [ - ] to be efficient at high entropies in addi- 
tion to low entropies places unnecessary demands on the design of 4>p[ • 1 • For 
example, when H(P) is very low Pq^- 1 whereas when H(P) becomes large anu 
the distributions flatten, Pq-* 0. Thus 1 is required to be efficient for 
unknown and varying p^ in the range from 0 to 1. This additional requirement 
can be avoided in most applications by instead specifying another operator, 
* 1 0 M. which selects between y^[-] and some operator which can efficiently 
code Z at the higher entropies. Such operators were the subject of previous 
chapters. The most general form was Og[- ] in Fig. 3-3 for which special cases 
include operator 4^1 ‘1 or 4^' J when there is no need for split-sample modes. 

In this case the 4 or 5 replaces 8 in the following discussions. 

It is a simple matter to define following the same procedures as 

in earlier chapters 

4> 10 [Z] = \ * *JZ] (4-19) 

where X Is the selected code operator number 8 or 9 (the concatenated X being 
interpreted as a binary zero or one). It remains to specify the decision rule to 
complete the definition of 

Optimum rule . Again the optimum rule is simply to count bits, choosing 
X such that 


<JP 



min 

i=8,9 



(4-20) 
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Howev er, use of the Basic Compressor estimates can result in 
considerable simplification. 

Simplified rule. Instead choose \ such that 

V (Z) = min y (Z) (4-21) 

X i = 8,9 

Noting that X. requires one bit for .dentification we also have 

r(o, 10 [z])< Y 10 (Z) = 1 H Y X (Z) (4-22) 

and where typically 

■'/*(*l 0 ra) * Y 10 (Z) (4-23) 

With available as an option which performs efficiently when H(P) is 

high, a less sophisticated form of ^[*1 (which works well only at very low 
entropies) can be assumed. In this case, the decision rules in (4-20) and 
(4-21) would always choose ^ = 8 when H(P) was high. 

Further simplifications . To further take advantage of the existence of 
the option * ]f note that in general a 9 buffer equal in size to the length of Z 

is required. But this buffer will tend to fill up only when H(P) is high and 
Pq is low. But then ^gt’l is the code choice. Consequently the required 
buffer size can be reduced. Let 


B 0 (4-24) 

be the length of this buffer and supplement the simplified decision rule by 
adding 


Set \ = 8 if //’(§) ^ B q 


(4-25) 
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to override the decision in (4-21). B Q can be experimentally chosen so that 
with high probability there is negligible loss in performance. 

Block Diagram, j qI * 1 

A block diagram describing v|> 1Q M using 4-21 and 4-25 appears in 
Fig. 4-3. 

PERFORMANCE 


Extensive tests of a sophisticated • ], using appropriate substitutions 

for^if , l from Chapter V, indicate average performance which remained close 
P 


to H(P) for any H(P) in the range of 0 to 8 bits / sample, where P was a priori 
unknown and not changing significantly over the measurement span. This is 
shown in Fig. 4-4. 

Average performance considerably under H(P) was observed in situations 

— t 

where P changed significantly over the measurement span. 


EXAMPLE 

An example of the use of SPLITf-] is given in Fig. 4-5 for a T = 256 sam- 
ple Z sequence. Observe that the © that results is the same as M n in the 
example of (3-18) and (3-19). Thus the coding of 0 for operator in (4-17) 

has already been described as a special case of 4* [• ] with 


r 



235 


(4-26) 


This example will be continued in Chapter V after first dev eloping appropriate 
binary code operators for D (see 5-47). 


1 * 

These tests were run on transform coefficients of the RM2 image compressor 
algorithm. ^ 
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jri g> 4-3. Block Diagram Operator 






s 



k \ 


V 



Fig. 4-5. Example SPLIT[-] 



V. CODING FOR BINARY MEMOR YL SS SOURCES 


WITH UNKNOWN STATISTICS 

This chapter provides a class of efficient code operators for real binary 
sources which can be realistically modeled as memoryless with a priori unknown 
and varying probabilities. 

Several of these binary operators, denoted • ] and 4>p,( • ] are intimate y 
related to the non-binary operators, ] and 4 ^q[* )> developed in Chapter IV. 

It will be advantageous for the reader to become familiar with the structure of 
the latter operators before *oceeding into the details here. 

PRACTICAL ASSUMPTIONS 
Data Model 
Let 

D = dj d 2 . . . d T (5-1) 

be a T sample sequence where the d^ are the output of a binary memoryless 
source with probability of a zero, To reflect real world variations in p^ 

we will generally aasume that the p^ for each D is a priori unknown and where 
Pq can lie in the range 0 < p^ < 1. 

Binary Entropy 

From (4-4) the binary entropy function is given as 


Hp(p Q ) = * PQ log 2 P 0 - (1 ' p 0 ) log 2 (1 ‘ P 0 J (5_2) 


and is shown in Fig. 5-1 for 0 < < 1. 
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BINARY E inTRO PY, H (p-), BITS/SAMPIE 


1 



Fils 5 - 1 . tiinary Enl *opy, 


✓V 




x > 9a * 



Quite unsurprisingly. Hp(p^) IS symmetrical about - 1 / 1 where Hp(p^) 
reaches its maximum value of 1. At this point zeroes and ones are totally random. 

As noted from previous discussions the interpretation of entropy in practi- 
cal problems usu ally requires some caution. The use of IIp(*) generally requires 
the ivensurement of an average probability, p^, determined over a span of 
samples which may be much larger than the length of D. When the real p^ is 
slowly varying over this measurement span then average code performance above 
but close to IIp(p 0 ) can be viewed as ’’efficient. M Hp(o^) acts as an approximate 
bound in this case because the data behaves like an ideal memorvless source. 
However, if data statistics are significantly changing over the measurement span, 
Pq will average out the changes. Since cannot account for the possibility 

of adapting a coder to these variations, average performance below Hp(p^) may 
be possible. 

The remainder of this chapter will seek to develop binary code operators 
which exhibit efficient performance characteristics in the sense described above. 

PREPROCESSING OF D 

The samples of D are not in a form suitable for a direct application of pre- 
viously developed code operators. This section will provide the necessary pre- 
processing of D. In so doing we will restrict the choice of various parameters 
to simplify discussion and potential practical implementations. However, it is 
felt .hat the chosen parameters are a good choice from a performance standpoint 
also. More general investigations are left for further study. 

th _ 

c Extension 

We firs l restrict the length of D to be multiple of e so th" t 

Y'(D) = T = Te (5-3) 
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c t H — 

Using operator Ext |*| from (2-b) yields the e extension of D 


= Ext C [Dj - d j . . . d; 


(5-4) 


where the r samples of D 1 take on the values 0, 1, 2, ... 2 -1 determined by the 
corresponding binary e-tuples of D. That is. the standard binary representation 
of d j is simply the i^ 1 consecutive e-tuple of D. We will principally rely on the 
context of a discussion to identify whether a d ! refers to its non-binary value or 
its binary representation. 

e-tuple Probability . If the binary digits representing any dj are the result 
of a binary memoryless source then the probability that these digits will form a 
particular e-timle with j ones is easily given as 


Any e- tuple with] 


j=Po- J 


j ones 


n - Pn>- 


(5-5) 


where p is the usual probability that an individual binary digit will be zero. 


There are 


,' C U -e J - 

\j / j ! ( e ~j)i 


(5-6) 


such e -tuples. Then, provided p^ > 1/?. we must have 


’Any e- tuple with 
Pr > Pr 

j ones 


Any e-t 

- i’ > j 


-tuple with"] 


(5-7) 


Conversely, if p^ < 1/2 the inequality sign in (5-7) reverses. 

Thus while D’ ; s a sequence of (approximately) independent samples taking 

e 

on the values 0, 1, 2, ... 2 -1 these values do not occur with the desired prob- 
ability ordering of (1-8). For example, by the above arguments 


***** 




■ s*' 


Pr [d: = 4] 


= Pr [ 


T e-tuple 


f c -tuple 

^ Pr [< 

i: = 3 

] = Pr j 

!ooo . . . 0100 

— 

i 

Loo ... r»n_ 


(5-8) 


where Pq > 1/2. Thus D 1 must be preprocecsed further before the remits of 
earlier chapters can be used. 


Ordering Probabilities 

We propose two reversible mappings of d\ samples given by f^\ * } and 
/jl*). The functional specification of these mappings can be described by the 
following: 



i 




(5-9) 


and 


±1 = '\[*[) ( 3 - 10 ) 

0 1 e 

where A. and A. take on the values 0, 1, 2, ... 2-1 (non-binary interpretation) 

and where 

Pr[Aj- = 0] > Pr[A^ = 1] 2 . . . i Pr[A^ = 2 C -l] (5-11) 

is satisfied for £, = 0 when p^ - 1/2- and for f, = 1 when 1/2 (provided the 

binary mcmoryless source model holds for the digits of D). 

Extending our notation slightly, the application of f^[* ] to all the t samples 
of D 1 in (5-4) yields the sequence given by 

\ = / C [D>] = [dj] :■ l\[d' 2 \' ... f { [ dg 

= a\"a\...a\ (5-12) 

where t, - 0 or 1 . 
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Then by the above assumptions for the f f [ •], eit * r A or \ should meet 
the desired requirements for a preprocessed data sequence in Fig. 1-1 when 
H < p Q < i. Further, since the operations of Ext e [-1 and f £ • ] are reversible 
the per sample entropy of should be increased (by a factor of e) into the 
efficient operating range of the Basic Compressor. The results of preceding 
chapters should be directly applicable. A block diagram summarizing the dis- 
cussion and notation thus far is given in Fig. 5-2. 


D erivation and Implementation of /^[« | 

From (5-7) we see that to obtain the ordering in (5-11) for p^ > 1/2 
need assign the number zero to the all zero e-tuple, the consecutive numbers 1 
to to e- tuples with a single one, the numbers + 1 up to + (^) to 
e -tuples with two ones and so on. Without any additional constraints the partic- 
ular assignment of numbers to e-tuples with the same number of ones \s 
arbitrary. 

An /^I’l mapping which results in the desired ordering in (5-11) when 
Pq < 1/2 can be described in exactly the same way by reversing the roles of 
zeroes and ones. We will investigate this further. 


Joint Implementation of /q[ • 1 and / [ * ] . Let i' be any binary e -tuple and 
P its bit by bit complement. We note that if v is one of the possible e-tuples 
with j ones then p is one of the ^ possible e-tuples with j zeroes. Wc 

can then implement • ] and [• ] as a table lookup by arranging al) possible 
e-tuples so that e-tuples with fewer ones appear at a higher position (closer to 
the top) and that if v is in position k-1 from the top (counting zero), P appears 
in position k-1 from the bottom. Then /qM is the position of v from the bottom. 
Ihis is shewn in Table 5-1. 
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Table 5-1. Arrangement of ./^[•]for Table Lookup 



Then 

^l [i?1 = 'V " 1 

and similarly (5-13) 

.f 0 !P] = fjv] 

This means that .fj(i') can be obtained by first complementing an input e -tuple 
and using f Q [ • ] as shown in Fig. 5-3. However, if both and / M are 

simultaneously desired, this approach requires two table lookups. This can be 
avoided. * 

Simply note that while f Q M is the position of from the top of the table, ; 

is the position from the bottom so that jj 

s 

f 

/jM = 2 e -l - f Q [v] (5-14) } 

But (5-14) is the same as 

= JqM (5-15) 
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where 


Fig. 5-3. Implementing • I and Method 1 


f 0 M = couP[f Q M) 

is the bit by bit complement ox fg(y] interpreted as a binary e-tuple. Thus 
fjf-1 and / [•] can be implemented as a single table lookup as shown in Fig. 5-4. 



Simplified Implementation of f q[ • I and -f • ] 


Fig. 5-4. 







f Q [ • ) and [ • ] for e = 4 

We will henceforth restrict the parameter e to be four. This choice has 
some practical advantages with no obvious disadvantages . s far as coding per- 
formance is concerned. Quite without surprise, we will seek to make use of the 
code operators developed in previous chapters. If we restrict e to be 4, the 
preprocessed samples of A^ in (5-12) will have entropies in the range of 0 to 4 
bits/sample, quite suitable for a direct application of the Basic Compressor 
operator developed in Chapter II. Higher entropies might necessitate split- 
sample modes as described in Chapter III. This would be an unnecessary com- 
plication for this application. 

A complete table defining both • ) and • ] for e = 4 appears in 
Table 5-2. 

GENERAL CODE OPERATOR STRUCTURE 

Since by (5-1 1) and the preceding developments, A^ and A^ satisfy the 
desired requirements for preprocessed data sequences (Fig. 1-1) the results of 
previous chapters should be directly applicable. The most obvious example is 
to use the Basic Compressor operator, i|j^[ • ], to code D by treating A^ and A^ 
as single J = T/4 sample Basic Compressor blocks or to partition Aq and A^ into 

4 * 

several sznaller blocks and then apply \\j^[ • ] from (2-71).' More generally, any 
operator, say iJjqM, can be used to codeAg and Aj in tire same manner. To this 
end we define 

Note that when vising * ] there is some obvious practical advantage to restrict- 
ing the length of theA^ (and hence D) to be a multiple of some convenient Basic 
Compressor block size J (e. g. , 16). Then from (5-3) we have 

, /(A i ) = r - t ' J <5-16) 

and 

'/'(D) = 4-r'J (5-17) 
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Input 4- Tuple 
v 


f 0 M 


Non 

Binary 

0 

1 

2 

4 
8 
3 

n 

5 

10 

9 

12 

7 

11 

13 

14 

15 


As 

4- Tuple 

0 0 0 0 
0 0 0 1 
0 0 10 
0 10 0 
10 0 0 
0 0 11 
0 110 
0 10 1 
10 10 
10 0 1 
110 0 
0 111 
10 11 
110 1 
1110 
1111 


Non 

Binary 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 


As 

4- Tuple 

0 0 0 0 
0 0 0 1 
0 0 10 
0 0 11 
0 10 0 
0 10 1 
0 110 
0 111 
10 0 0 
10 0 1 
10 10 
10 11 
110 0 
110 1 
1110 
1111 


Non 

Binary 


15 

14 

13 

12 

11 

10 


9 

8 

7 

6 

5 

4 

3 

2 




As 

4- Tuple 

1111 
1110 
110 1 
110 0 
10 11 
10 10 
10 0 1 
10 0 0 
0 111 
0 110 
0 10 1 
0 10 0 
0 0 11 
0 0 10 
0 0 0 1 
0 0 0 0 


1 

0 

















4^(D] - J 


(5-18) 


where £ identifies the selected choice of preprocessed sequence Ag or (the 
concatenated £, being interpreted as a binary zero or one) and j is a priori fixed 
(e. g. , 4 or 5). 


Decision Rules 

Optimum, By counting bits £ is chosen such that 


#(4>. [A,]) = min //tUA-]) 
J ^ i=0, 1 J 1 


(5-19) 


Simplified rule . By using the Basic Compressor performance estimates 
(bounds), £ is chosen such that 


y.(A L ) = min y (A.) 
J b i=0, 1 J 


(5-20) 


Since E, requires one bit for specification we have the bound and estimate for 


i^[DJ given by 


^(4> J p [D])<Yj(D) S 1 + y.(A^) 


(5-21) 


Majority rule. In some applications it may be acceptable to simply choose 


E, by the majority rule 




otherwise 


(5-22) 






The applicability of this rule is most suitable when p^ is slowly varying 
and the length of D is large or when an operator is expected to ope ate only for 
very low or very high values of p^. 

Note that (5-21) is still a useful estimate when (5-22) is used instead of (5-20). 


Block Diagram 

A block diagram describing • ] is shown in Fig. 5-5 as turning the simpli- 
fied rule in (5-20). 


Restricted Range of p ^, ^ * ] 

ir* some applications the range of p^ may be limited to either 0 < < 1/2 

or 1/2 < Pq < 1. Quite obviously under these conditions, there is no need to 

choose between Aq or A^ for coding purposes, and correspondingly no need for 

the £ identifier in (5-18). It is useful to define two simplified operators which 

fit these conditions. Define i|A[ • ] for £> = 0 and 1 by 

P 


4^,[D] = 


(5-23) 


where we a priori choose C = 0ifpQ>l/2 and £, = 1, otherwise. The block dia- 
gram for * 1 in Fig. 5-5 reduces to that shown in Fig. 5-6. 


THE BASIC BINARY OPERATORS 

A direct application of the I ~>ic Compressor to the coding of the A^ in 

4 5 4 

Figs. 5-5 and 5-6 yields "the Basic Binary Operators" ^pM andi|ip,[*], 

5 

^p'l']. We will investigate these operators further in this section. Before pro- 
ceeding, note that ^pM is really a special case of ijjpM in which only one Basic 
Compressor block is used. Similarly, the ‘r'pj'l are really simpler versions 
of i|>pM designed to work over half the range of pQ. Thus we can concentrate on 
•4>p [ * ] without loss in generality. 
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Fig 5-6. Operator ^2i[ •] for Limited Range of p n 

P u 


Implementation of Simplified Rale 

Observe from the developments in Chapter II that an essential element in 
the generation of Basic Compressor estimates Y^*) or ’ s th e computation 

of "fundamental sequence length" F. By (2-4), this amounts to adding the samples 
making up a J-Samp:e Basic Compressor block. The test in (5-20) and Fig. 5-5 
suggests that these same computations are required for each block mr.king up 
both Aq andA^. However, the structure of Tables 5-1 and 5-2 ca •> be used to 
avoid a requirement to add the samples of A^ . The results provide an additional 
practical argument for the use of a block ze of J = 16. 





I? 


r. 
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Two's Complement . Assuming the same partitioning of A Q and A^ into 
J = 16 Sample Basic Compressor blocks let 


~i 11 1 

X =XjX 2 ... x l6 


(3-24) 


be any such block from A. With FS length given by (2-4) as 

16 

F ‘ * 16 + £ 4 

k=l 

where i ~ 0, 1. 

By '5-14) we have 


(5-25) 


lie 0 

x k = 15 -*k 

We can then write as 


16 


F = 16 + 


£ <15 ■ ^ 


k=l 


(5-26) 


or more simply 


= 16 + |256 - F°| 


(5-27) 


F* = 16 + Two's Complement [F^] 


(5-28) 


Expected Performance 

Under the rather ideal assumption that all the T samples of £S are the 
result of a binary memoryless source with a priori unknown but constant p^, 
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4 5 f 

the expected performance of 4'pl‘l and 4>p( - ] can be bounded. The results of *' 

Appendix A p rovide a result of the forrn^ 


Y e|^(^[D])| < (p 0 , T) bits /sample (5-29) 

for j = 4 and 5, and 0 < p^ < 1. 

4 

A plot of Ap (Pq, 256) is given in Fig. 5-7. 

Observe from L*q. A-2 that because p^ is assumed constant over the length 
of D additional Basic Compressor blocks of actually degrade performance 

by increasing the overhead. In this situation there is no advantage to adapting 
since the data characterization is the same for all of D. However, in many 
practical problems p Q will change over the length of D and the added flexibility 
to change code options may more than make up for the additional overhead. 
Average performance under Hpfpg), where p^ is the usual measured average of 
p Q , is a typical result. 

Performance of4»pM . With no elaboration necessary, we have 


1 E^j ,(D])}< Aj* (p 0 , T) - 1 (5-30) 

for j = 4 or 5, provided p^ is limited to either p^ > 1/2 or p^ < 1/2, 

5 

Example of I • ] 

Let D be the T = 256 sample binary sequence of Fig. 4-5. Following 
Fig. 5-5 we obtain D' as 




We have taken the liberty to leave out the additional parameter of rj for the 
number of Basic Compressor blocks in 
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O.M 



PtOSAIIUTY or ZHO. P 0 


4 

Fig. 5-7. Expected Performance of on Ideal Memoryless Source, 
Unknown but Constant p 
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D' = Ext 4 [Dj = (0, 0. 0. 0. 0. 0, 0, 0, 0. 0, 0, 0, 0, 0, 0, 0)* 

(0, 8, 7, 1, 8, 1, 6, 4, 1, 0, 0, 0, 0, 0, 0, 0)* 

(0, 0, 0, 0, 8, 3, 14, 8, 0, 5, 12, 4, 4, 6, 1, 4)=:= 

(0, 3, 1, 3, 5, 5, 14, 7. 2, 0, 0, 11, 12, 14, 0, 0) 

(5-31) 

where we have conveniently split D' into four 16 sample blocks. Now applying 
/qM °f Table 5-2 to each sample of D' yieldi 

£ 0 = ( 0 , 0 , 0 , 0 , 0 , 0 , 0 ,;. 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 )* 

(0, 4, 11, 1, 4, y# 3, 1, 0, 0, 0, 0. 0, 0, 0)* 

(0, 0, 0, 0, 4, 5, 14, 4, 0, 7, 10, 3, 3, 6, 1, 3)* 

(0, 5, 1, 5, 7, 7, 14, 11, 2, 0, 0, 12, 10, 14, 0, 0) (5-32) 

Applying /j(*] to obtain A^ can be done in several ways. The simplest is to use 
Table 5-2 again, yielding 

Aj = (15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15)* 

(15, 11, 4, 14, 11, 14, 9. 12, 14, 15, 15, 15, 15, 15, 15, 15)* 

(15, 15, 15, 15, 11, 10, 1, 11, 15, 8, 5, 12, 12, 9, 14, 12)* 

(15, 10, 14, 10, 8, 8, 1, 4, 13, 15, 15, 3, 5, 1, 15, 15) (5-33) 

We could also obtain Aj by using (5-13) and (5-15). For example, take sample 

19 of D\ Its non-binary form is seven = 0111 = v. Complementing each bit we 
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get i- = 1000 = eight. Then using Table 5-2 for /q(‘~) we get 
f(8) = 0100 r. foul* - ^(7). 

Following the procedure in (5-15) we first obtain ,/q( 7) = eleven = 1011. 
Complementing each bit we obtain /q( 7) = 0100 = four. The same result. 

Code Estimates . A^ andA^ have alrea.iyr been partitioned into 16 sample 
blocks to accommodate application of the Basic Compressor in the form of 
in (2-69). To facilitate notation, let 

A. =A.(l)-A.(2):I.(3):^ i (4) (5-34) 

where the A.(t’) corresponds to the partitioned blocks in (5-32) and (5-33) and 
let 

Fj (5-35) 

be the fundamental sequence length corresponding toAj(t 7 )« Then we have by 
(2-74) and (2-28) 

4 4 

V 5 a i' * 52 Y 4 a i U ’” = 8 + Yj <5 ‘ 36 ’ 

?=1 t=\ 


Making use of the procedures developed in Chapter II for Basic Compressor 
estimates leads to the results for Aq shown in Table 5-3. 

Table 5-3. Estimates for 


F °i 

16 

47 

76 

104 

ID 

0 

1 

2 

3 

\o 

6 

47 

58 

64 
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Using Table 5-3, Yg(Ag) becomes 

Y 5 (A 0 ) = 183 (5-37) 

Making use of (5-27) or (5-28) we see that 

Y 5 (A q ) « YgfAj) (5-38) 

so that the simplified decision rule of (5-20) provides the choice of 

£ = 0 (5-39) 

(the optimum rule of (5-19) also yields this decision). 

Then by (5-21) we have 

y’(4>pp]) < 184 (5-40) 

Coding A q. Making use of the Basic Compressor code decisions in 

5 

Table 5-3 the coding of D using vpM takes the form 

Vp[D] = 0 00 : ^ 0 [^ 0 (1)1 : 01 ^^^(2)1^1 0=:=v 2 [A 0 (3)]^ll=:=v 3 [A 0 i4)] (5-41) 

The resulting coded Basic Compressor blocks are shown in Table 5-4. From the 
table we note that //’(^[AqP)]) = ^ bits, ^ wo ^ ess than the estimate shown in 
Table 5-3. Thus 

y(ijjp[D]) = 182 bits (5-42) 

Entropy . From D we obtain the relative frequency of zeroes in D as 

P 0 * 1ST m °- 7 ’ 7 < 5 - 43 ' 
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Example: Coded Blocks of A 



0000000001 1 IOIOIOOTIOOOOOOOOOIOOIIOIOII ITT 1 01 1 TOT 01 01 0001 01 00000 













Then using the binary entropy function in (5-2) 


Hg(p 0 ) = 0.728 bits/sample (5-44) 

By contrast 


^ i?(yp(D)) = III = 0. 711 bits/sample (5-45) 

Optimum decision . It is worth noting that if we had actually used the opti- 
mum Basic Compressor decision rule (counting b ; ts) code operator 
of 4« ] would have been chosen for block two, A^(2), resulting in a redaction 
of coding bits by 3. Note however that the advantage obtained by use of the 
optimum rule is (by observation) typically much less than this. 

Coding of Z . Observe that the coding of D in this example completes the 
coding example for non -binary sources initiated in Fig. 4-5. From (4-26) we 
have 

W e (e]) = 235 (5-46) 

and using (5-42) we get 

= 182 + 235 = 417 bits (5-47) 

for an average of 1. 63 bits /sample. 

BINARY OPERATORS FOR VERY LOW ENTROPY 

Using the ideal memoryless model with constant p^ as a practical guide, 

4 5 

note that the average performance of or t * ] in Fig. 5-7 remains within 

about 0. 1 bits /sample of the entropy H-j(Pq) for 0 < p^ < 1. At p Q = 0 and 
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Pq = 1 the performance "bottoms out" at 0. 095 bits/sample for the example 
shown. As entropy decreases from a maximum of 1.0, this (approximately) 

0, 1 bits/sample difference represents an increasing fraction of Hp(pg). In this 
section, we seek to improve the efficiency at these low entropy values. 

A closer look at the distribution of the non-binary A^ samples as p^ — 1 
(see 5-5) reveals the same situation characterized by Fig. 4-i: a spike at zero, 
dominating the distribution. This clearly suggests that the approach taken in 
Chapter IV should be directly applicable to the coding of the A^ and hence binary 
sequence D. This is precisely the approach we will take. It is suggested that 
the reader review the operator structure developed in Chapter IV before 
proceeding. 

9 9 

Introduction to Operators ^pM and . [• I 

A direct substitution of operator ipgM from Fig. 4-2 into the block diagrams 
for and^p,!*] in Figs. 5-5 and 5-6 respectively, provides the two new oper- 

ators shown in Figs. 5-8 and 5-9. A more elaborate expansion of ^ [•] is pro- 
vided in Fig. 5-8 for discussion purposes. At the same time, we have taken the 
liberty to expand the notation in an obvious way. 

Recall that any operator indicated by the notation [•] is not completely 
specified without reference to a "parameter string" which identifies its internal 
parameters such as input block lengths, decision rules, internal code operators, 
etc. The really identify a code operator structure. 

St 

9 9 

With this in mind we can make some observations about ^p[*] and 
Operator form . From Figs. 5-8 and 5-9 we have 

+p[D] = (5-48) 
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Block D: 






and 


+p,[BJ =^[d 4 ]H' 0 [© c ) 


(5-49) 


where we note that ^pt’land 4^,1 * 1 differ only in the fact that 4 is chosen during 

q 9 

operation by ipp ( * 1 whereas it is preselected for 

Block Lengths . With T = we have, by (5-3) and (5-4) 

SC(A^) = t. = T/4 = 

Requirements for ^ qM. Since the samples of and hence are 4-bit 
numbers there is no need for split-sample modes by operator 4 qM- ‘t'q! * 1 can 
be replaced by a variable length version of the Basic Compressor, in 

( 2 - 70 ). 

Operator ^p[»] . Now consider the block labeled • ] within 4 j^[ • ] in 

Fig. 5-8. Just as in the original definition of 4*^1 * 1* ^pl'l can in general be any 

binary operator (structure) for coding the T/4 sample sequence D^, including 

ipp f • ) or 4* p , [ * ] which we are currently discussing. Thus the block diagrams in 

Figs. 5-8 and 5-9 actually describe an infinite class of possible operators (e. g. , 

q 

the parameter string for any 4g t * 1 could specify 4p 1*1 as its internal binary 
operator • ], which leads to another ij^H and so on). The usual theoretical 
concerns for such infinite classes is not our main concern in this paper since 
our interest here is to provide code operators for practical use. However, we 
will return to the possibility of expanding 4p [ • 1 in this manner later. 
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9 9 

Expected Performance for p [ • 1 and , [ • ] 

o 

From (5-48) and Fig. 5-8, the per sample performance of ‘j'pl'J can be 
written as 


19-1 i 1©C J) 

t g ¥ D1) a t + m ^ n 4 + — 




(5-50) 


where ^(D^) = T/4. Then under the assumption that all the T samples of D are 
the result of a binary memoryless source with a priori unknown but constant Pq 
( over D), the expected performance can be bounded by 

T E W'M 5 A p( p c T ) = T + 3 N( a ’ f ) + 4( b - ?)} <5 - 51 > 

where 


and 


and 


«(*■ f) 


(5-52) 

f ) 

* ^ E {- 2 ’4 |6 t j >} 

(5-53) 

a = max (p Q> 1 - p Q ) 

(5-54) 


rtj 

ii 

.Q 

(5-55) 


Similarly, we note that without the need to identify 4 

Hif(4;5])H’(p 0 , f) = 4( p o- t)' 7 


(5-56) 


99 


The term Ag in (5-52) bounds the expected bits \ r A^ sample) to code 
the variable length sequence 9^. The detail.; are provided in Appendix B assum- 
ing that 4 * qI *1 = ^gf*] an< l that 0^ is treated as a single Basic Compressor block. 

The expression Ap in (5-53) bounds the expected performance of a binary- 
operator 4p[*)on T/4 sample sequence with a priori unknown but constant prob- 
ability of a zero given by b in (5-55). Thus a complete determination of expected 

9 i 

performance for + p( * 1 requires that internal binary operator ‘J'pl • ] be specified. 

In general, could again be any binary operator, including the one we are 

9 

currently investigating, 4*pl* ]. For example, if we let the internal binary operator 
9 9 

of be 4pl *1, the bound in (5-51) could be expanded to 

a p<po- T) = f + 1 M*. f) +A p( b ’f)} 

where 


A p( b ’ f) = 174 + i{ A e( a '’ •S) +A s( b '’ TS")} 


(5-57) 


and by (5-54) and (5-55) 


a' = max(b, 1 - b) 


(5-58) 


and 


b' = a' 


(5-59) 


Clearly the expected performance of a large class of binary operators can 


be derived by appropriate substitution of parameters. 



Example . An interesting example is provided by using the basic binary 

4 5 q 

operators ippM or ^Hin the internal structure of ^[.J. The expected per- 
formance of these operators was investigated earlier and displayed graphically 
in Fig. 5-7. Using as the internal binary operator of yields the 

results shown in Fig. 5-10 assuming an input block length of T = 256. The graph 

was obtained by replacing the last term in (5-51) by A*(b, 64). The correspond- 

4 P 

ing results for ij'pl • ] directly operating on D are repeated for comparison 

purposes. 


Introduction to Operators ijj [ - ] and ijj , [•] 

P P 

By Fig. 5-10 operating onA^ performs better than for a range 

of intermediate values ci pQ. Following our usual procedure for such situations 
we can simply choose between these operators. This amounts to replacing *>.[.] 
internal to (Fig. 5-5) and'^.H (Fig. 5-6) by ^M. The resulting block 

diagrams of binary operators ^j^Mand are shown in Figs. 5-11 and 5-12. 

Operator form . From the figures we have 


^°[d] = 


(5-60) 


and 


*p?[D]= [S^] 


(5-61) 


where is either 9 or 5* Again the only difference between «|A°[.J and ^r 1 ?[ - ] 

10 ^ 

is that t, is chosen during operation by tp [•] whereas it must be a priori selecte 
for 


V 1 or >4.('] replaces the ^ 8 ( •) assumed in Fig. 4-3 here since the alphabet 
size of A ^ is only 16 so that no split-sample modes are needed. Recall that 
these operators are really special cases of «p 0 [*). 

O 
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Fig. 5-10. 


Q 

Bounds to Expected Performance of on Weal Memoryless 

Source, Unknown but Constant Pq ” 
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Fig. 5-11. Basic Block Diagram Operator 












Decision rules for £ . A selection of decision rules for choosing £ i n 4^1 '1 
is described in (5-19) - (5-22). The simplified rule of (5-20) is preferred in 
this case, and can be further simplified. 

A direct application of (5-20) would require that £ be chosen such that 


WV = ™ l o” , 


(5-62) 


5 9 

But by considerations of expected performance of • ] and 4>pl • ] just investi- 
gated and practical observation, whenever the internal decision rule of 
would choose operator [ • ] instead of I we would also find that 


V 9 (A c ) < Y 5 (^) < Y l0 (^) 


(5-63) 


where l, = COMP[£j. Thus there is no need to consider in the determina- 

tion of The decision rule in (5-62) reduces to choosing £> such that 




(5-64) 


Internal decision rules . As in Chapter IV the decision rules for selecting 
either 4'«J # 1 or * 1 may be simplified in some applications to reduce the 

O 7 

required buffer size for 0^ samples. Following (4-24) and (4-25) the rule for 
choosing is: 

Choose X, = 5 if .^(6,) 2 B D 
1 c, t) ^ 

Otherwise choose Xj such that 


V\ (A r ) = min Y;(^ ») 
\ * i=5, 9 1 b 


(5-65) 


where 0,, buffer size B Q can be experimentally chosen so that the loss in per- 
s> or 

^ 5 

formance is acceptable. Basically, this rule forces a decision to use 4^ Mat 
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1 


intermediate values of p^, By Fig. 5-10 is either the best choice or a 

good second best for a wide range of p^. 

o 

Choosing B quite small means the 9 [•] is used only to improve 

performance for very high or low values of pQ. Under these conditions an addi- 

9 

tional simplification results because the internal will never be chosen 

unless p^ > 1/2. Thus 4^1*] can be replaced by The particular options 

chosen for implementation will depend on the particular application which may 
not have such ideal stationary statistics. 


-7 

i -• 


f ; 


< < 


Expected Performance . A bound to the expected performance of operators 
• i and iJjpff’Ji assuming D arises from an ideal binary memoryless source 
with unknown but constant Pq, is easily obtained from previous results. We have 

-f E K° [6 i }- A p 0( po' t) (5 - 66) 


where 


and as usual 


A p°(P<y T) = + min^A* (p ; , T), A^,(p^, T)} (5-67) 


1 , 
j 


£, = 


0 if p Q > 1/2 


otherwise 


(5-68) 


Again we have assumed that +*,[•] incorporates a single Basic Compressor 

P 

4 10 

block making it equivalent to 1* A graph of A^ (p^, 256) would essentially 

be the lower envelope of the curves in Fig. 5-10 provided 4 i_[.) is terminated 

after one SPLIT [• ] operation and the buffer is not restricted. 

10 , 


Similarly, without the need to identify £, A^, (p^, T) is given by 

Ap?(P 0 . T) = Ap°(p 0 , T> - % 


(5-69) 
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Multi-Level 4^ [ • ] and 4 ^? [ • ] 

1 r 

As noted earlier any operator structure which includes vji [ • ] in its defini- 
tion really defines an infinite class of operators. This is because the SPLITi*] 
operation results in a new internal binary source which must be coded by some 

binary operator •] • Since 4^['l may again contain the SPLIT[*] operation 
* P 

the procedure can be repeated ovei and over again. But the number of binary 
samples decreases by four at each branch into this "tree of code operators. " 
Thus the incremental impact on computation required for decision making at 
each step diminishes rapidly. 

We provide an illustration of the procedures by expanding * 1 several 
times yielding an operator which would code a T = 1024 bit all zero or all ones 
sequence with only 11 bits. A block diagram is provided in Fig. 5-13 where we 
have taken the liberty io expand notation in an obvious way. Additionally, we 
have also elected to accept some slight loss in performance for this example by 

choosing internal binary operators and in the expansion instead 

10 5 _ ~ 

of vj; p [•] and i|> [•]. By earlier discussions (see 5-65) the O^and 0^ ^ buffers 

in Fig. 5-13 can be substantially reduced. 

Expansion of 4^1 *1. The creation of multilevel operators such as 4*^ [ * ] i 
Fig. 5- 13 is accomplished by continually replacing tbe binary operator 4^1 * 1 which 
follows a SPLIT! •] by another which also includes ?. SPLIT! •). For the example 
in Fig. 5-13, we have 


in 


4p°[D] = [2L C : 


] 


(5-70) 


where equals 5 or 9 just as in Fig. 5-11. If \ 1 = 9 we would have 


1 


*p °[d] = 


lOrft 


(5-71) 
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since we have used • 1 as the internal binary operator of •]. 
Expanding y*? ( • ] we have 

+p? [i V = V+xA 1 


( 5 * 72 ) 


where again ^ eq u ^ls 5 or 9. If = 9 

1 Of i \ j # , 5 r r\ 


+pA ] = 0 l*4. e [e c 0 ] 


(5-73) 


since we have terminated expansion by using 4^,1 •] as the binary operator for 
q (i. e. , tp p , [ • ] contains no SPLIT! •] operation).^ 

If the code decisions are actually = \^ = 5 (very low or high Pq) the 
final expanded form of d^fD] is given as 


+j°[D] = 


5 rR 


(5-74) 


All zero input sequence . If we take the length of D as T = 1024 and 
assume it io either all zeroes or all ones, 'Jip^[D] will take the form in (5-74) 
with both 4jq[©^ q] and contributing zero bits. Since ^ q reduces to a 

16 sample all zeroes sequence, ipp,[D^ q] will require only 8 bits. Adding in the 
3 bits for and we have 


•^’('Vp^lD = all zeroes or all ones]) =11 bits 


(5-75) 


or approximately 0.01 bits/sample. 


t 10 

'Note that using + p 1*1 instead would change (5-73) to the form 


♦ p 0 ®; 1 = 


5 


(5-76) 
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Fig. 5 - 13 . Block Diagram Multi-Level 4^1 • 1 
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Additional Adaptivity 

In many real problems Pq may not only vary but the manner in which it 
changes may also vary. Whereas the algorithms described in this chapter will 
typically perform well under a measured Hpfpg) in these situations, in many 
cases, it may be desirable to provide additional adaptivity. Such desirable 
modifications and extensions of the preceding developments will be the subject of 
future reports. 





APPENDIX A 

STATISTICAL PERFORMANCE BOUNDS 
FOR ^ 4 [*1 AND 4j®[-1 

Here we develop bounds on the expected performance of binary operators 
4 5 

4»p[ • ] and on T sample binary sequences D (5-1) under the assumption that 

the samples of D are the result of an ideal binary memoryless sourct with a 
priori unknown but constant probability of a zero, Pq, where 0 £ Pq < 1. 
Specifically, we develop the result 


YE|^(4j^[D])}<A j (p 0 , T, ri) bits/sample 


(A-l) 


where r| refers to the number of Basic Compressor blocks used _/ 4^1*1 to code 


A^ in Fig. 5-5. is given by 


a j p (p 0 , t. ri) = LL+ - 2 n } + ^ Y(d( f^) 


(A-2) 


!> 


, * 

? A 


i 

i 


where 


= t [ p t + i4p ; (i ■ p t > + 5i p^ 1 ■ 


1 


+ 54 P; (1 - p ; ) 3 + 16(1 


P 4 ) 4 ] 


(A-?) 


is a fund imental y& 3 quence length for a J = T/4 Basic Compressor block and 


/ 


/ 


t = 


V is evaluated using (2-27). 


0 if p 0 2 j 


1 otherwise 


(A-4) 


! * 

l i 

[i 


I ? 

x 
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DERIVATION OFA^ 

First observe ihat operator • ] is a special case of operator ipgf *] for 
which only one Basic Compressor block is used (rj = 1). Thus, we can henceforth 
assume j = 5. 

As in (2-67) and (2-68), A^ can be partitioned into q Basic Compressor 
blocks so .hat 


\ = A ; (1)*A c ( 2)* ... A c (n) (A-5) 

and where the A^(£) consist of samples satisfying 

7 = J e ,A - 6) 

e=i 

Now tracing through the appropriate equations, by (5-21) 

^(4;p[D]) s 1 + y 5 [A^] (A-7) 

Denoting the fundamental sequence length for A^((f) by F^ and using (2-29) and 
(2-74) we have 

^ e{^(^( 5])J s LL±ll\L + I ^ e{ Y|d (F^)} (A-8) 

e=i 


where the first term is simply the overhead in identifying £ and Basic Compressor 
code options for the q blocks. 

But by (2-32) we have 

e Iv f M s v*1> 

0=1 t - 1 


(A-9) 



Using the binary memoryless assumption and recalling the mapping of D into 
_ r 

F^ is given as 

= = J e[ p t + 14p ^ (1 ■ V + 51p2 {1 • p ; )2 

+ 54p ; (1 - p ; ) 3 + 16(1 - p ; ) 4 ] (A-10) 

where quite obviously 

F° < F^ 1 for P 0 2 J (A-ll) 


identifying the choice of £ as in (A-4). 

r 

Substituting F£ in the ’ ) of (2-23) - (2-26) we see that each function is 
a multiple of block size (ignoring truncation). T’hus choosing the minimum in 
(2-28) is independent of block size. Operator decisions, ID, are the same for 
each of the rj blocks. Then the right hand side of (A-9) can more simply be 

replaced by a single T/4 sample Basic Compressor block with expected funda- 

r 

mental sequence length given by D in (A- 3) with £ determined by (A-4). 


Comment 

The reduction of the righthand side of (A-9) to a single Basic Compressor 
block should come as no surprise. Because of the ideal memoryless model, 
data statistics do not change over D. There is no advantage to adapting because 
the extra code blocks just cost in overhead (i. e. , 2q/T in A-8). However, 
if data statistics were actually changing over D, as in many practical situations, 
the extra adaptivity might more than make up for the additional overhead. 
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APPENDIX B 

STATISTICAL PERFORMANCE BOUND 
ON ^ 0 [e ; ] = ^ 6 le ; ] 


Hero we provide a bound to the expected bits/sample required to code 0^ 

— _ 9 

where 0^ is the result of the SPLITl • ] operator on A ^ for operator 1 • 1 or 

9 — 

♦ J in Figs. 5-8 and 5-9 respectively, and T-sample input sequence D is 

assumed to be the result of a binary memoryless source with a priori unknown 

but constant probability of a zero, p 0 (over D). Operator [ • ] = • 1 from 

(2-70) will be assumed to consist of only one Basic Compressor block, equal in 

length to.'/^Gjh 

The desired result is of the form (see B-10) 


t/4 e MV® ] )} - A e( p r > > 4) 


(B-l) 


where 


t 


.r 


if Po 2 2 


otherwise 


(B-2) 


a = max 1 - p Q ) 


ro* 


(B-3) 


and 


p^O = b = p ; 


(B-4) 


We recognize p* as the probability of a zero in any bit position of A^. That is, 
after the decision t,. Similarly, py 10 is the probability of an all zero 4-tuple 
symbol in A^. The additional notation a, b is provided to allow easier usage 
in the main text. 




\ 

\ 

} 

a ^ 

it 
t " 


i > 


j /! 

§ *• 


| 


» V 


! 

* 



f 

I 


(* 
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DERIVATION OF A_(p_, T/4) 


By (2-28) and (2-39) 


E{a’(+ 9 [e ; ])}<2 +Y , D (F e . J 9 ) 


where 


F e = E 


M+A 1 )} 


and as in (4-9) 


J e -E{^M=|u -P,,0> 


The expression in (B-3) is easily evaluated by noting that a sample of A ^ 
of value i will contribute i bits to the fundamental sequence for 6^. Then by 
summing the expected contributions for all T/4 samples we have 

F e = J [ 10 P4 (1 -p ; ) + 45p^(i -p ; ) 2 
+ 50p ; (l - Pj .) 3 + 15(1 - p^) 4 ] 


F 0 can be written into a more convenient form by factoring out p^ 0 


yielding 


F e = 3 e 


where 




1^- J jlOp 3 + 45 P 2 (1 - p c ) + 50p^(l - p^) 2 


+ 15(1 - p ; ) 
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Bound Expression 

Substituting (B-3) and (B-6) into (B-2) ising (2-23) - (2-27)* we get after 
simplification 

t / *7 ^ 

4 A e (p C 4 ) = 2 + j 0 min|^f2 ; - 2, + 2, 4 } (B-10) 

and we observe that this bound may be replaced by zero if p is precisely 
1 (there would be no samples). 


T 


Note that y^ = 4 for this problem. 
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